DOCKET NO.: 22221/1030 (RU-339) 
EXPRESS MAIL NO.: EL709322449US 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

UTILITY PATENT APPLICATION TRANSMITTAL FORM 
(only for new nonprovisional applications under 37 CFR 1.53(b) 

ASSISTANT COMMISSIONER FOR PATENTS 
Washington, D.C. 20231 
BOX: PATENT APPLICATION 

SIR: 

Transmitted herewith for filing is the patent application (including Specification, Claims, and 
Abstract, 130 pages) of: 

Inventors : Michael E. O'Donnell, Alexander Yuzhakov, Olga Yurieva, David Jeruzalmi, Irina 
Bruck, and John Kuriyan 

For : ENZYMES DERIVED FROM THERMOPHILIC ORGANISMS THAT 

FUNCTION AS A CHROMOSOMAL REPLICASE, PREPARATION AND USE 
THEREOF 

**Ifa CONTINUING APPLICATION, please mark where appropriate and supply the requisite 
information below and in a preliminary amendment: 

[ ] continuation [ ] divisional [X] Continuation-In-Part (CIP) 
of prior application Serial No. 

Prior application information: Examiner : 
Art Unit : 

Enclosed are: 

[X] 83 sheets of Formal drawings. 

[ ] Signed Combined Declaration and Power of Attorney ( pages). 

[ ] Copy of signed Combined Declaration and Power of Attorney ( pages) from a prior 

application (1.63(d) (for continuation/divisional). 

[ ] Signed statement deleting inventor(s) named in prior application ( pages) (1 .63(d)(2) and 

1.33(b)). 

[ ] Incorporation By Reference: The entire disclosure of the prior application, from which a copy 
of the oath or declaration is supplied herewith, is considered as being part of the disclosure of the 
enclosed application and is hereby incorporated by reference therein. 

[ ] Assignment ( pages) of the invention to . 

[ ] Certified copy of a foreign priority document. 

[ ] Associate power of attorney. 

[X] Applicants claim small entity status. (See 37 CFR 1.27.) 




R446597.1 



-2- 

[] Preliminary Amendment ( pages). 

[ ] Information Disclosure Statement, form PTO-1449 ( pages) and references. 

[X] UNSIGNED Combined Declaration and Power of Attorney (3 pages). 

[X] Statement in Accordance with 37 CFR § 1.821(f) and computer readable 3.5" Diskette. 

[X] Sequence Listing (165 pages). 

[X] A self-addressed, prepaid postcard acknowledging receipt. 
[ ] Other: 

The Filing fee has been calculated as shown below: 



(Col. 1) (Col. 2) SMALL ENTITY LARGE ENTITY 



FOR: 


NO. FILED 


NO. EXTRA 




RATE 


FEE 


OR 


RATE 


FEE 


BASIC FEE 


XXXXXXXX 


XXXXXXXX 




XXXX 


$355 


OR 


XXXX 


$710 


TOTAL CLAIMS 


70 - 20 = 


50 




x 9 = 


$450 


QR 


x 18 = 


$ 


INDEP CLAIMS 


2 - 3 = 


0 




x40= 


$0 


OR 


X80 = 


$ 


[ ] MULTIPLE DEPENDENT CLAIM PRESENTED 




xl35 = 


$ 


QR 


x270 = 


$ 


*If the Total Claims are less than 20 and Indep. Claims 
are less than 3, enter "0" in Col. 2 


TOTAL 


$805 


OR 


TOTAL 


$ 



[ ] Please charge my Deposit Account No. in the amount of $ . A duplicate 

copy of this sheet is enclosed. 

[X] A check in the amount of $805.00 to cover the filing fee is enclosed. 

[X] The Commissioner is hereby authorized to charge any additional fees which may be required, or 
credit any overpayment to Deposit Account No. 14-1 138 . A duplicate copy of this sheet is 
enclosed. 

[X] Address all future communications to: 

Michael L. Goldman 
NIXON PEABODY LLP 
Clinton Square, P.O. Box 31051 
Rochester, New York 14603 

Date : /l^]/^^^^- 2J Z Ooo 

Edwin V. Merkel 
Registration No. 40,087 

NIXON PEABODY LLP 
Clinton Square, P.O. Box 31051 
Rochester, New York 14603 
Telephone: (716)263-1128 
Facsimile: (716)263-1600 





EXPRESS MAIL CERTIFICATE 
DOCKET NO. : 22221/1030 (RU-339) 

APPLICANTS : Michael E. O'Donnell, Alexander Yuzhakov, Olga Yurieva, David 
Jeruzalmi, Irina Bruck, and John Kuriyan 

TITLE : ENZYMES DERIVED FROM THERMOPHILIC ORGANISMS 

THAT FUNCTION AS A CHROMOSOMAL REPLICASE, 
PREPARATION AND USE THEREOF 

Certificate is attached to the Patent Application Including Specification, 
Claims, and Abstract (130 pages), Unsigned Combined Declaration and Power of 
Attorney (3 pages), and Sequence Listing (165 pages) of the above-named application. 

"EXPRESS MAIL" NUMBER: EL709322449US 
DATE OF DEPOSIT: November 21, 2000 

I hereby certify that this paper or fee is being deposited with the United States 
Postal Service "Express Mail Post Office to Addressee" service under 37 CFR 1.10 on the 
date indicated above and is addressed to the Assistant Commissioner for Patents, 
Washington, D.C. 20231, Box: Patent Application. 

Jo Ann Whalen 

(Typed or printed name of person mailing 
paper or fee) 

(Signature^of person mailing paper or fee) 



R447018.1 



TITLE: ENZYMES DERIVED FROM THERMOPHILIC 

ORGANISMS THAT FUNCTION AS A 
CHROMOSOMAL REPLICASE, 
PREPARATION AND USE THEREOF 



INVENTORS: Michael E. O'Donnell, Alexander Yuzhakov, Olga 
Yurieva, David Jeruzalmi, Irina Bruck, and John 
Kuriyan 



DOCKET NO.: 22221/1030 (RU-339) 



R385368.3 



- 1 - 

ENZYMES DERIVED FROM THERMOPHILIC ORGANISMS THAT 
FUNCTION AS A CHROMOSOMAL REPLICASE 

The present application is a continuation-in-part of U.S. Patent 
5 Application Serial No. 09/057,416 filed April 8, 1998, which claims the benefit of 
U.S. Patent Application Serial No. 08/823,407 filed April 8, 1997, and U.S. 
Provisional Patent Application Serial No. 60/143,202 filed April 8, 1997, all of which 
are hereby incorporated by reference. 

The present invention was made with funding from National Institutes 
1 0 of Health Grant No. GM38839. The United States Government may have certain 
rights in this invention. 

FIELD OF THE INVENTION 

1 5 The present invention relates to thermostable DNA polymerases and, 

more particularly, to such polymerases as can serve as chromosomal replicases and 
are derived from thermophilic bacteria. More particularly, the invention extends to 
DNA polymerase III -type enzymes from thermophilic bacteria, including Aquifex 
aeolicus, Thermus thermophilus , Thermotoga maritima, and Bacillus 

20 stearothermophilus, as well as purified, recombinant or non-recombinant subunits 
thereof and their use, and to isolated DNA coding for such polymerases and their 
subunits. Such DNA is obtained from the respective genes (e.g., dnaX, holA, holB, 
dnaA, dnaN, dnaQ, dnaE, ssb, etc.) of various thermophilic eubacteria, including but 
not limited to Thermus thermophilus, Aquifex aeolicus, Thermotoga maritima, and 

25 Bacillus stear other mophilus . 

BACKGROUND OF THE INVENTION 

Thermostable DNA polymerases have been disclosed previously as set 
30 forth in U.S. Patent No. 5,192,674 to Oshima et al., U.S. Patent Nos. 5,322,785 and 
5,352,778 to Comb et al., U.S. Patent No. 5,545,552 to Mathur, and others. All of the 
noted references recite the use of polymerases as important catalytic tools in the 
practice of molecular cloning techniques such as polymerase chain reaction (PCR). 
Each of the references states that a drawback of the extant polymerases are their 
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limited thermostability, and consequent useful life in the participation in PCR. Such 
limitations also manifest themselves in the inability to obtain extended lengths of 
nucleotides, and in the instance of Tag polymerase, the lack of 3' to 5' exonuclease 
activity, and the drawback of the inability to excise misinserted nucleotides (Perrino, 
5 1990). 

More generally, such polymerases, including those disclosed in the 
referenced patents, are of the Polymerase I variety as they are often 90-95kDa in size 
and may have 5' to 3' exonuclease activity. They define a single subunit with 
concomitant limits on their ability to hasten the amplification process and to promote 

1 0 the rapid preparation of longer strands of DNA. 

Chromosomal replicases are composed of several subunits in all 
organisms (Kornberg and Baker, 1992). In keeping with the need to replicate long 
chromosomes, replicases are rapid and highly processive multiprotein machines. 
Cellular replicases are classically comprised of three components: a clamp, a clamp 

15 loader, and the DNA polymerase (reviewed in Kelman and O'Donnell, 1995; 

Mc Henry, 1991). For purposes of the present invention, the foregoing components 
also serve as a broad definition of a "Pol Ill-type enzyme". 

DNA polymerase III holoenzyme (Pol III holoenzyme) is the 
multi-subunit replicase of the E. coli chromosome. Pol III holoenzyme is 

20 distinguished from Pol I type DNA polymerases by its high processivity (>50 kbp) 
and rapid rate of synthesis (750 nts/s) (reviewed in Kornberg and Baker, 1992; 
Kelman and O'Donnell, 1995). The high processivity and speed is rooted in a ring 
shaped subunit, called B, that encircles DNA and slides along it while tethering the 
Pol III holoenzyme to the template (Stukenberg et al., 1991; Kong et al., 1992). The 

25 ring shaped B clamp is assembled around DNA by the multisubunit clamp loader, 
called y complex. The y complex couples the energy of ATP hydrolysis to the 
assembly of the B clamp onto DNA. This y complex, which functions as a clamp 
loader, is an integral component of the Pol III holoenzyme particle. A brief overview 
of the organization of subunits within the holoenzyme and their function follows. 

30 Pol III holoenzyme consists of 10 different subunits, some of which are 

present in multiple copies for a total of 18 polypeptide chains (Onrust et al., 1995). 
The organization of these subunits in the holoenzyme particle is illustrated in Fig. 1 . 
As depicted in the diagram, the subunits of the holoenzyme can be grouped 



functionally into three components: 1) the DNA polymerase III core is the catalytic 
unit and consists of the a (DNA polymerase), s (3'-5' exonuclease), and 0 subunits 
(McHenry and Crow, 1979), 2) the B "sliding clamp" is the ring shaped protein that 
secures the core polymerase to DNA for processivity (Kong et al., 1992), and 3) the 5 
5 protein y complex (yS8'x,v|/) is the "clamp loader" that couples ATP hydrolysis to 

assembly of B clamps around DNA (O'Donnell, 1987; Maki et al., 1988). A dimer of 
the x subunit acts as a "macromolecular organizer" holding together two molecules of 
core (Studwell-Vaughan and O'Donnell, 1991; Low et al, 1976) and one molecule of 
y complex forming the Pol IIP subassembly (Onrust et al., 1995). This organizing 

10 role of x to form Pol III* is indicated in the center of Fig. 1 . Two B dimers associate 
with the two cores within Pol IIP to form the holoenzyme, which is capable of 
replicating both strands of duplex DNA simultaneously (Maki et al., 1988). 

The DNA polymerase III holoenzyme assembles onto a primed 
template in two distinct steps. In the first step, the y complex assembles the B clamp 

1 5 onto the DNA. The y complex and the core polymerase utilize the same surface of the 
B ring and they cannot both utilize it at the same time (Naktinis et al., 1996). Hence, 
in the second step the y complex moves away from B thus allowing access of the core 
polymerase to the B clamp for processive DNA synthesis. The y complex and core 
remain attached to each other during this switching process by the x subunit organizer. 

20 The y complex consists of 5 different subunits (y2-4Si8'iX}V|/i). An 

overview of the mechanism of the clamp loading process follows. The 5 subunit is 
the major touch point to the B clamp and leads to ring opening, but 8 is buried within 
y complex such that contact with B is prevented (Naktinis et al., 1995). The y subunit 
is the ATP interactive protein but is not an ATPase by itself (Tsuchihashi and 

25 Kornberg, 1989). The 8' subunit bridges the 8 and y subunits resulting in a y58' 

complex that exhibits DNA dependent ATPase activity and is competent to assemble 
clamps on DNA (Onrust et al., 1991). Upon binding of ATP to y, a change in the 
conformation of the complex exposes 5 for interaction with B (Naktinis et al., 1995). 
The function of the smaller subunits, % and \\>, is to contact SSB (through %) thus 

30 promoting clamp assembly and high processivity during replication (Kelman and 
O'Donnell, 1995). 



-4- 



The three component Pol Ill-type enzyme in eukaryotes contains a 
clamp that has the same shape as E. coli B, but instead of a homodimer it is a 
heterotrimer. This heterotrimeric ring, called PCNA (proliferating cell nuclear 
antigen), has 6 domains like B, but instead of each PCNA monomer being composed 
5 of 3 domains and dimerizing to form a 6 domain ring (e.g., like B), the PCNA 
monomer has 2 domains and it trimerizes to form a 6 domain ring (Krishna et al., 
1994; Kuriyan and O'Donnell, 1993). The chain fold of the domains are the same in 
prokaryotes (B) and eukaryotes (PCNA); thus, the rings have the same overall 
6-domain ring shape. The clamp loader of the eukaryotic Pol Ill-type replicase is 

10 called RFC (Replication factor C) and it consists of subunits having homology to the 
Y and 8' subunits of the E. coli y complex (Cullmann et al., 1995). The eukaryotic 
DNA polymerase Ill-type enzyme contains either of two DNA polymerases, DNA 
polymerase 5 and DNA polymerase s (Bambara and Jessee, 1991; Linn, 1 99 1 ; 
Sugino, 1995). It is entirely conceivable that yet other types of DNA polymerases can 

1 5 function with either a PCNA or B clamp to form a Pol Ill-type enzyme (for example, 
DNA polymerase II of E. coli functions with the B subunit placed onto DNA by the y 
complex clamp loader) (Hughes et al., 1991; Bonner et al., 1992). The bacteriophage 
T4 also utilizes a Pol Ill-type 3 -component replicase. The clamp is a homotrimer like 
PCNA, called gene 45 protein (Young et al., 1992). The gene 45 protein forms the 

20 same 6-domain ring structure as B and PCNA (Moarefi et al., 2000). The clamp 

loader is a complex of two subunits called the gene 44/62 protein complex. The DNA 
polymerase is the gene 43 protein and it is stimulated by the gene 45 sliding clamp 
when it is assembled onto DNA by the 44/62 protein clamp loader. The Pol Ill-type 
enzyme may be either bound together into one particle (e.g., E. coli Pol III 

25 holoenzyme), or its three components may function separately (like the eukaryotic Pol 
Ill-type replicases). 

There is an early report on separation of three DNA polymerases from 
T. th. cells, however each polymerase form was reminiscent of the preexisting types of 
DNA polymerase isolated from thermophiles in that each polymerase was in the 

30 1 10,000-120,000 range and lacked 3'-5' exonuclease activity (Ruttimann et al., 1985). 
These are well below the molecular weight of Pol Ill-type complexes that contain in 
addition to the DNA polymerase subunit, other subunits such as y and t. Although the 
three polymerases displayed some differences in activity (column elution behavior, 



and optimum divalent cation, template, and temperatures) it seems likely that these 
three forms were either different repair type polymerases or derivatives of one repair 
enzyme (e.g., Pol I) that was modified by post translational modification(s) that 
altered their properties (e.g. phosphorylation, methylation, proteolytic clipping of 
residues that alter activity, or association with different ligands such as a small protein 
or contaminating DNA). Despite this previous work, it remained to be demonstrated 
that thermophiles harbor a Pol Ill-type enzyme that contain multiple subunits such as 
y and/or t, functioned with a sliding clamp accessory protein, or could extend a 
primer rapidly and processively over a long stretch (>5kb) of ssDNA (Ruttimann et 
al., 1985). 

Previously, it was not known what polymerase thermophilic bacteria 
used to replicate their chromosome since only Pol I type enzymes have been reported 
from thermophiles. By distinction, chromosomal replicases, such as Polymerase III, 
identified in E. coli, if available in a thermostable bacterium, with all its accessory 
subunits, could provide a great improvement over the Polymerase I type enzymes, in 
that they are generally much more efficient — about 5 times faster — and much more 
highly processive. Hence, one may expect faster and longer chain production in PCR, 
and higher quality of DNA sequencing ladders. Clearly, the ability to practice such 
synthetic techniques as PCR would be enhanced by these methods disclosed for how 
to obtain genes and subunits of DNA polymerase III holoenzyme from thermophilic 
sources. 

The present invention is directed to achieving these objectives and 
overcoming the various deficiencies in the art. 

SUMMARY OF THE INVENTION 

In accordance with the present invention, DNA Polymerase Ill-type 
enzymes as defined herein are disclosed that may be isolated and purified from a 
thermophilic bacterial source, that display rapid synthesis characteristic of a 
chromosomal replicase, and that possesses all of the structural and processive 
advantages sought and recited above. More particularly, the invention extends to 
thermostable Polymerase Ill-type enzymes derived from thermophilic bacteria that 
exhibit the ability to extend a primer over a long stretch (>5kb) of ssDNA at elevated 
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temperature, the ability to be stimulated by a cognate sliding clamp (e.g., P) of the 
type that is assembled on DNA by a 'clamp' loader (e.g., y complex), and have clamp 
loading subunits that show DNA stimulated ATPase activity at elevated temperature 
and/or ionic strength. Representative thermophile polymerases include those isolated 
5 from the thermophilic eubacteria Aquifex aeolicus (A.ae. polymerase) and other 
members of the Aquifex genus; Thermus thermophilus {T.th polymerase), Thermus 
favus (Tfl/Tub polymerase), Thermus ruber (Tru polymerase), Thermus brockianus 
(DYNAZYME™ polymerase), and other members of the Thermus genus; Bacillus 
stearothermophilus {B.st. polymerase) and other members of the Bacillus genus; 

10 Thermoplasma acidophilum (Tac polymerase) and other members of the 

Thermoplasma genus; and Thermotoga neapolitana (Trie polymerase; see WO 
96/10640 to Chatterjee et al.), Thermotoga maritima (Tma polymerase; see U.S. 
Patent No. 5,374,553 to Gelfand et al.), and other species of the Thermotoga genus 
(Tsp polymerase). In a preferred embodiment, the thermophilic bacteria comprise 

1 5 species of Aquifex, Thermus, Bacillus, and Thermotoga, and particularly A.ae. , T.th., 
B.st., and Tma. 

A particular Polymerase Ill-type enzyme in accordance with the 
invention may include at least one of the following sub-units: 

A. ay subunit having an amino acid sequence corresponding to 
20 SEQ. ID. Nos. 4 or 5 (T.th); 

B. ax subunit having an amino acid sequence corresponding to 
SEQ. ID. No. 2 {T.th), SEQ. ID. No. 120 (A.ae.), SEQ. ID. No. 142 (Tma.) or SEQ. 
ID. No. 182 (B.st.); 

C. as subunit having an amino acid sequence corresponding to 
25 SEQ. ID. No. 95 (T.th), SEQ. ID. No. 128 (A.ae.), or SEQ. ID. No. 140 (T.ma.); 

D. a a subunit including an amino acid sequence corresponding to 
SEQ. ID. No. 87 (T.th), SEQ. ID. No. 118 (A.ae.), SEQ. ID. No. 138 (Tma.), or 
SEQ. ID. Nos. 184 (PolC which has both a and s activity, B.st.); 

E. a B subunit having an amino acid sequence corresponding to 
30 SEQ. ID. No. 107 (T.th), SEQ. ID. No. 122 (A.ae.), SEQ. ID. No. 144 (Tma.), or 

SEQ. ID. No. 174 (5. sr.); 
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F. a 5 subunit having an amino acid sequence corresponding to 
SEQ. ID. No. 158 (T.tk), SEQ. ID. No. 124 (A.ae.), SEQ. ID. No. 146 (T.ma.) or 
SEQ. ID. No. US (B. st.); 

G. a 8' subunit having an amino acid sequence corresponding to 
5 SEQ. ID. No. 156 (T.tk), SEQ. ID. No. 126 (A.ae.), SEQ. ID. No. 148 (T.ma.) or 

SEQ. ID. No. 180 (B.st.); 

variants, including allelic variants, muteins, analogs and fragments of 
any of subparts (A) through (G), and compatible combinations thereof, capable of 
functioning in DNA amplification and sequencing. 

10 The invention also extends to the genes that correspond to and can 

code on expression for the subunits set forth above, and accordingly includes the 
following: dnaX, holA, hoIB, dnaQ, dnaE, dnaN, and ssb, as well as conserved 
variants and active fragments thereof. 

Accordingly, the Polymerase Ill-type enzyme of the present invention 

1 5 comprises at least one gene encoding a subunit thereof, which gene is selected from 
the group consisting of dnaX, holA, hoIB, dnaQ, dnaE and dnaN, and combinations 
thereof. More particularly, the invention extends to the nucleic acid molecule 
encoding the y and x subunits, and includes the dnaX gene which has a nucleotide 
sequence as set forth herein, as well as conserved variants, active fragments and 

20 analogs thereof. Likewise, the nucleotide sequences encoding the a subunit (dnaE 
gene), the s subunit (dnaQ gene), the P subunit (dnaN gene), the 8 subunit (holA 
gene), and the 8 1 subunit (hoIB gene) each comprise the nucleotide sequences as set 
forth herein, as well as conserved variants, active fragments and analogs thereof. 
Those nucleotide sequences for T.th. are as follows: dnaX (SEQ. ID. No. 3), dnaE 

25 (SEQ. ID. No. 86), dnaQ (SEQ. ID. No. 94), dnaN (SEQ. ID. No. 106), holA (SEQ. 
ID. No. 157), and hoIB (SEQ. ID. No. 155). Those nucleotide sequences for A.ae. axe 
as follows: dnaX (SEQ. ID. No. 119), dnaE (SEQ. ID. No. 117), dnaQ (SEQ. ID. No. 
127), dnaN (SEQ. ID. No. 121), holA (SEQ. ID. No. 123), and hoIB (SEQ. ID. No. 
125). Those nucleotide sequences for T.ma. are as follows: dnaX (SEQ. ID. No. 141), 

30 dnaE (SEQ. ID. No. 137), dnaQ (SEQ. ID. No. 139), dnaN (SEQ. ID. No. 143), holA 
(SEQ. ID. No. 145), and hoIB (SEQ. ID. No. 147). Those nucleotide sequences for 
B.st. are as follows: dnaX (SEQ. ID. No. \%\),polC (SEQ. ID. Nos. 183), dnaN 
(SEQ. ID. No. 173), holA (SEQ. ID. No. 177), and hoIB (SEQ. ID. No. 179). 



The invention also provides methods and products for identifying, 
isolating and cloning DNA molecules which encode such accessory subunits encoded 
by the recited genes of the DNA polymerase Ill-type enzyme hereof. 

Yet further, the invention extends to Polymerase Ill-type enzymes 
5 prepared by the purification of an extract taken from, e.g., the particular thermophile 
under examination, treated with appropriate solvents and then subjected to 
chromatographic separation on, e.g., an anion exchange column, followed by analysis 
of long chain synthetic ability or Western analysis of the respective peaks against 
antibody to at least one of the anticipated enzyme subunits to confirm presence of Pol 
10 III, and thereafter, peptide sequencing of subunits that co purify and amplification to 
obtain the putative gene and its encoded enzyme. 

The present invention also relates to recombinant y, x, s, a (as well as 
PolC), 5, §' and B subunits and SSB from thermophiles. In the instance of the y and x 
subunits of T. th. , the invention includes the characterization of a frameshifting 
1 5 sequence that is internal to the gene and specifies relative abundance of the y and t 
gene products of T.th. dnaX. From this characterization, expression of either one of 
the subunits can be increased at the expense of the other (i.e. mutant frameshift could 
make all x, simple recloning at the end of the frameshift could make exclusively y and 
no t). 

20 In a further aspect of the present invention, DNA probes can be 

constructed from the DNA sequences coding for, e.g., the T.th.,A.ae., T.ma., or B.st. 
dnaX, dnaQ, dnaE, dnaA, dnaN, holA, holB, and ssb genes, conserved variants and 
active fragments thereof, all as defined herein, and may be used to identify and isolate 
the corresponding genes coding for the subunits of DNA polymerase III holoenzyme 

25 from other thermophiles, such as those listed earlier herein. Accordingly, all 

chromosomal replicases (DNA Polymerase Ill-type) from thermophilic sources are 
contemplated and included herein. 

The invention also extends to methods for identifying Polymerase Ill- 
type enzymes by use of the techniques of long-chain extension and elucidation of 

30 subunits with antibodies, as described herein and with reference to the examples. 

The invention further extends to the isolated and purified DNA 
Polymerase III from T.th.,A.ae., T.ma., and B.st., the amino acid sequences of the y, t, 
e, a (as well as PolC), 8, 5', and B subunits and SSB, as set forth herein, and the 
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nucleotide sequences of the corresponding genes from T.th.,A.ae., T.ma., or B.st. set 
forth herein, as well as to active fragments thereof, oligonucleotides and probes 
prepared or derived therefrom and the transformed cells that may be likewise 
prepared. Accordingly, the invention comprises the individual subunits enumerated 
5 above and hereinafter, corresponding isolated polynucleotides and respective amino 
acid sequences for each of the y, x, s, a (as well as PolC), 5, 8', and B subunits and 
SSB, and to conserved variants, fragments, and the like, as well as to methods of their 
preparation and use in DNA amplification and sequencing. In a particular 
embodiment, the invention extends to vectors for the expression of the subunit genes 

1 0 of the present invention. 

The invention also includes methods for the preparation of the DNA 
Polymerase Ill-type enzymes and the corresponding subunit genes of the present 
invention, and to the use of the enzymes and constructs having active fragments 
thereof, in the preparation, reconstitution or modification of like enzymes, as well as 

15 in amplification and sequencing of DNA by methods such as PCR, and like protocols, 
and to the DNA molecules amplified and sequenced by such methods. In this regard, 
a Pol Ill-type enzyme that is reconstituted in the absence of s, or using a mutated s 
with less 3'-5' exonuclease activity, may be a superior enzyme in either PCR or DNA 
sequencing applications, (e.g. Tabor et al., 1995). 

20 The invention is directed to methods for amplifying and sequencing a 

DNA molecule, particularly via the polymerase chain reaction (PCR), using the 
present DNA polymerase Ill-type enzymes or complexes. In particular, the invention 
extends to methods of amplifying and sequencing of DNA using thermostable pol Ill- 
type enzyme complexes isolated from thermophilic bacteria such as Thermotoga and 

25 Thermus species, or recombinant thermostable enzymes. The invention also provides 
amplified DNA molecules made by the methods of the invention, and kits for 
amplifying or sequencing a DNA molecule by the methods of the invention. 

In this connection, the invention extends to methods for amplification 
of DNA that can achieve long chain extension of primed DNA, as by the application 

30 and use of Polymerase III -type enzymes of the present invention. An illustration of 
such methods is presented in Examples 15 and 16, infra. 

Likewise, kits for amplification and sequencing of such DNA 
molecules are included, which kits contain the enzymes of the present invention, 
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including sublimits thereof, together with other necessary or desirable reagents and 
materials, and directions for use. The details of the practice of the invention as set 
forth above and later on herein, and with reference to the patents and literature cited 
herein, are all expressly incorporated herein by reference and made a part hereof. 
5 As stated, and in accordance with a principal object of the present 

invention, Polymerase Ill-type enzymes and their sub-units are provided that are 
derived from thermophiles and that are adapted to participate in improved DNA 
amplification and sequencing techniques, and the consequent ability to prepare larger 
DNA strands more rapidly and accurately. 
10 It is a further object of the present invention to provide DNA 

molecules that are amplified and sequenced using the Polymerase Ill-type enzymes 
hereof. 

It is a still further object of the present invention to provide enzymes 
and corresponding methods for amplification and sequencing of DNA that can be 

1 5 practiced without the participation of the clamp-loading component of the enzyme. 

It is a still further object of the present invention to provide kits and 
other assemblies of materials for the practice of the methods of amplification and 
sequencing as aforesaid, that include and use the DNA polymerase Ill-type enzymes 
herein as part thereof. 

20 One goal of this invention is to fully reconstitute the rapid and 

processive replicase from an extreme thermophilic eubacterium from fully 
recombinant protein subunits. One might think that the extreme heat in which these 
bacteria grow may have resulted in a completely different solution to the problem of 
chromosome replication. Prior to filing of the previously-identified priority 

25 applications, it is believed that Pol III had not been identified in any thermophile until 
the present inventors found that Thermus thermophilics, which grows at a rather high 
temperature of 70-80°C, would appear to contain a Pol III. Subsequent to this 
invention, the genome sequence of A. aeolicus was published which shows dnaE, 
dnaN, and dnaX genes. However, previous work did not fully reconstitute the 

30 working replication machinery from fully recombinant subunits. A hoi A gene and 

holB has not been identified previously in T. thermophilus or A. aeolicus, and studies 
in the E. coli system show that delta and delta prime, encoded by holA and holB, 
respectively, are essential to loading the beta clamp onto DNA and, thus, is essential 
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for rapid and processive holoenzyme function (U.S. Patent Nos. 5,583,026 and 
5,668,004 to O'Donnell, which are hereby incorporated by reference). 

This invention fully reconstitutes a functional DNA polymerase III 
holoenzyme from the extreme thermophiles Thermus thermophilus and Aquifex 
5 aeolicus. Aquifex aeolicus grows at an even higher temperature than Thermus 
thermophilus, up to 85°C. In this invention, the genes of Thermus thermophilus, 
Aquifex aeolicus, Thermotoga maritima, and Bacillus stear other mophilus that are 
necessary to reconstitute the complete DNA polymerase III machinery, which acts as 
a rapid and processive polymerase, are identified. Indeed, a delta prime (holB) and 

1 0 delta QiolA) subunits are needed. 

The dnaE, dnaN, dnaX, dnaQ, hoi A, and holB genes are used to 
express and purify the protein "gears", and the proteins are used to reassemble the 
replication machine. The T.th. Pol III is similar to E. coli. The A.ae. Pol III is slightly 
dissimilar from the machinery of previously studied replicases. The A.ae. dnaX gene 

15 encoded only one protein, tau, and in this fashion is similar to the dnaX of the gram 
positive organism, Staphylococcus aureus. In contrast, the dnaX of the gram negative 
cell, E. coli, produces two proteins. The Aquifex aeolicus polymerase subunit, alpha 
(encoded by dnaE) does not contain the 3'-5' proofreading exonuclease. In this 
regard, A. aeolicus is similar to E. coli, but dissimilar to the replicase of the gram 

20 positive organisms. In Gram positive organisms, the PolC polymerase subunit of the 
replicase contains the exonuclease activity in the same polypeptide chain as the 
polymerase (Low et al., 1976; Barnes et al., 1994; Pacitti et al, 1995). Further, the 
polymerase III of thermophilic bacteria retains activity at high temperature. 

Thermostable rapid and processive three component DNA polymerases 

25 can be applied to several important uses. DNA polymerases currently in use for DNA 
sequencing and DNA amplification use enzymes that are much slower and thus could 
be improved upon. This is especially true of amplification as the three component 
polymerase is capable of speed and high processivity making possible amplification 
of very long (tens of Kb to Mb) lengths of DNA in a time-efficient manner. These 

30 three component polymerases also function in conjunction with a replicative helicase 
(DnaB), and thus are capable of amplification at a single temperature, using the 
helicase to melt the DNA duplex. This property could be useful in some methods of 
amplification, and in polymerase chain reaction (PCR) methodology. For example, 
the ax58Vp form of the E. coli DNA polymerase III holoenzyme has been shown to 



- 12- 



function in both DNA sequencing and PCR (U.S. Patent Nos. 5,583,026 and 
5,668,004 to O'Donnell). 

Other objects and advantages will become apparent from a review of 
the ensuing description which proceeds with reference to the following illustrative 
5 drawings. 

DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a schematic depiction of the structure and components of 
10 enzymes of the general family to which the enzymes of the present invention belong. 

FIGURE 2 is an alignment of the N-terminal regions of E. coli (SEQ. 
ID. No. 19) and B. subtilis (SEQ. ID. No. 20) dnaX gene product. Asterisks indicate 
identities. The ATP binding consensus sequence is indicated. The two regions used 
for PCR primer design are shown in bold. 
1 5 FIGURE 3 is an image showing the Southern analysis of T. 

thermophilics genomic DNA. Genomic DNA was analyzed for presence of the dnaZ 
gene using the PCR radiolabeled probe. Enzymes used for digestion are shown above 
each lane. The numbering to the right corresponds to the length of DNA fragments 
(kb). 

20 FIGURES 4A and 4B depict the full sequence of the dnaX gene of T. 

thermophilics. DNA sequence (upper case, and corresponding to SEQ ID No. 1) and 
predicted amino acid sequence (lower case, and corresponding to SEQ ID No. 2) 
yields a 529 amino acid protein (x) of 58.0 kDa. A putative frameshifting sequence 
containing several A residues 1478-1486 (underlined) may produce a smaller protein 

25 (y) of 49.8 kDa. The potential Shine-Dalgarno (S.D.) signal is bold and underlined. 
The start codon is in bold, and the stop codon for x is marked by an asterisk. The 
potential stop codon for y is shown in bold after the frameshift site, and two potential 
Shine-Dalgarno sequences upstream of the frameshift site are indicated. Sequences of 
the primers used for PCR are shown in italics above the nucleotide sequence of dnaX. 

30 The ATP binding site is indicated, and the asterisks above the four Cys residues near 
the ATP site indicate the putative Zn 2+ finger. The proline rich area is indicated 
above the sequence. Numbering of the nucleotide sequence is presented to the right. 
Numbering of the amino acid sequence of x is shown in parenthesis to the right. 
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FIGURE 4C depicts the isolated DNA coding sequence for the dnaX 
gene (also present in FIGURES 3 A and 3B) in accordance with the invention, which 
corresponds to SEQ. ID. No. 3. 

FIGURE 4D depicts the polypeptide sequence of the y subunit of the 
5 Polymerase III of the present invention, which corresponds to SEQ. ID. No. 4. 

FIGURE 4E depicts the polypeptide sequence of the y subunit of the 
Polymerase III of the present invention defined by a -1 frameshift, which corresponds 
to SEQ. ID. No. 4. 

FIGURE 4F depicts the polypeptide sequence of the y subunit of the 
10 Polymerase III of the present invention defined by a -2 frameshift, which corresponds 
to SEQ. ID. No. 5. 

FIGURES 5A-B are alignments of the y/x ATP binding domains for 
different bacteria. Dots indicate those residues that are identical to the E. coli dnaX 
sequence. The ATP consensus site is underlined, and the conserved cysteine residues 
15 that form the zinc finger are indicated with asterisks. E. coli, Escherichia coli (SEQ. 
ID. No. 21); H. inf., Haemophilus influenzae (SEQ. ID. No. 22); B. sub., Bacillus 
subtilis (SEQ. ID. No. 23); C. cres., Caulobacter crescentus (SEQ. ID. No. 24); M 
gen., Mycoplasma genitalium (SEQ. ID. No. 25); T.th., Thermus thermophilus (SEQ. 
ID. No. 26). Alignments were produced using Clustal. 
20 FIGURE 6 is a diagram indicating a signal for ribosomal frameshifting 

in T.th. dnaX. The diagram shows part of the sequence of the RNA (SEQ. ID. No. 27) 
around the frameshifting site (SEQ. ID. No. 28), including the suspected slippery 
sequence A9 (bold italic). The stop codon in the -2 reading frame is indicated. Also 
indicated are potential step loop structures and the nearest stop codons in the -1 
25 reading frame. 

FIGURE 7 is an image showing a Western analysis of y and x in T.th. 
cells. Whole cells were lysed in SDS and electrophoresed on a 10 % SDS 
polyacrylamide gel then transferred to a membrane and probed with polyclonal 
antibody against E. coli y/x as described in Experimental Procedures. Positions of 
30 molecular weight size markers are shown to the left. Putative T.th. y and x are 
indicated to the right. 

FIGURES 8A-B are images of E. coli colonies expressing. T.th. dnaX 
-1 and -2 frameshifts. The region of the dnaX gene slippery sequence was cloned into 
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the lacZ gene of pUC19 in three reading frames, then transformed into E. coli cells 
and plated on LB plates containing X-gal. The slippery sequence was also mutated by 
inserting two G residues into the A9 sequence and then cloned into pUC19 in all three 
reading frames. Color of colonies observed are indicated by the plus signs. The 
5 picture shows the colonies, the type of frameshift required for readthrough (blue 
color) is indicted next to the sector. 

FIGURE 9 shows the construction of the T.th. y/x expression vector. A 
genomic fragment containing a partial sequence of c/nalwas cloned into pALTER-1. 
This fragment was subcloned into pUC19 (pUC19 _dnaX). Then the N-terminal 

1 0 section of dnaX was amplified such that the fragment was flanked by Ndel (at the 

initiating codon) and the internal BamHI site. This fragment was inserted to form the 
entire coding sequence of the dnaXgene in pUC19 {^\5C\9dnaX). The dnaX gene 
was then cloned behind the polyhistidine leader in the T7 based expression vector 
pET16 to give pET16dnaX. Details are in "Experimental Procedures". 

1 5 FIGURES 1 OA-C illustrate the purification of recombinant T. th. y and 

x subunits. T.th. y and x subunits were expressed in E. coli harboring pETl6dnaX. 
Molecular size markers are shown to the left of the gels, and the two induced proteins 
are labeled as g and t to the right of the gel. Panel A) 10% SDS gel of E. coli whole 
cell lysates before and after induction with IPTG. Panel B) 8% SDS gel of the 

20 purification two steps after cell lysis. First lane: the lysate was applied to a HiTrap 
Nickel chromatography column. Second lane: the T.th. y/x subunits were further 
purified on a Superose 12 gel filtration column. Third lane, the E. coli y and i 
subunits. Panel C) Western analysis of the pure T.th. y and x subunits (first lane) and 
E. coli y and x subunits (second lane). 

25 FIGURES 1 1 A-B show the gel filtration of T.th. y and x. T. th. y and x 

were gel filtered on a Superose 12 column. Column fractions were analyzed for 
ATPase activity and in a Coomassie Blue stained 10% SDS polyacrylamide gel. 
Positions of molecular weight markers are shown to the left of the gel. The elution 
position of size standards analyzed in a parallel Superose 12 column under identical 

30 conditions are indicated above the gel. Thyroglobin (670 kDa), bovine gamma globin 
(150 kDa), chicken ovalbumin (44 kDa), equine myoglobin (17 kDa). 

FIGURES 12A-C illustrate the characterization of the T.th. y and x 
ATPase activity. The T.th. y/x and E. coli x subunits are compared in their ATPase 
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activity characteristics. Due to the greater activity of E. coli x, the values are plotted 
as percent for ease of comparison. Actual specific activities for 100 % values are 
given below as pmol ATP hydroly zed/30 min./pmol T.th. y/x (or pmol E. coli x). 
Panel A) T.th. y and x ATPase is stimulated by the presence of ssDNA. T.th. y/x was 
incubated at 65°C. Specific activity was: 1 1.5 (+DNA); 2.5 (-DNA); E. coli x was 
assayed at 37°C. Specific activity values were: 1 12.5 (+DNA); (7.3-DNA). Panel B) 
Temperature stability of DNA stimulated ATPase activity. T.th. y/x, 1 1.3 (65°C); E. 
coli x, 97.5 (37°C). Panel C) Stability of T.th. y/x ATPase to NaCl. T.th. y/x, 8.1 (100 
mM added NaCl and 65°C); E. coli x, 52.7 (0 M added NaCl and 37°C). 

FIGURES 13A-13C are graphs that summarize the purification of the 
DNA polymerase III from T.th. extracts. Panel A) shows the activity and total protein 
in column fractions from the Heparin Agarose column. Peak 1 fractions were 
chromatographed on ATP agarose. Panel B) depicts the ATP-agarose column step, 
and Panel C) shows the total protein and DNA polymerase activity eluted from the 
MonoQ column. 

FIGURES 14A-B are SDS polyacrylamide gels of T.th. subunits. 
Fig. 14A is a 12% SDS polyacrylamide gel stained with Coomassie Blue of the 
MonoQ column. Load stands for the material loaded onto the column (ATP agarose 
bound fractions). FT stands for protein that flowed through the MonoQ column. 
Fractions are indicated above the gel. T.th. subunits in fractions 17-19 are indicated 
by the labels placed between fractions 18 and 19. Additional small subunits may be 
present but difficult to visualize, or may have run off the gel. E. coli y,5 shows a 
mixture of the a, y, and 8 subunits of DNA polymerase III holoenzyme (they are 
labeled to the right in the figure). Fig. 14B shows the Western results of an SDS gel 
of the MonoQ fractions probed with rabbit antiserum raised against the E. coli a 
subunit. Load and FT are as described in Panel A. Fraction numbers are shown 
above the gel. The band that comigrates with E. coli a, and the band in the Coomassie 
Blue stained gel in Panel A, is marked with an arrow. This band was analyzed for 
microsequence and the results are shown in Fig. 15. 

FIGURES 15A-B show the alignments of the peptides obtained from 
T.th. a subunit, TTH1 (shown in A) and TTH2 (shown in B) with the amino acid 
sequences of the a subunits of other organisms. The amino acid number of these 
regions within each respective protein sequence are shown to the right. The 
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abbreviations of the organisms are as follows. E.coli - Escherichia coli, V.chol- 

Vibrio cholerae, H.inf. - Haemophilus influenzae, R.prow. - Rickettsia prowazekii, 

H.pyl. - Helicobacter pylori, S.sp. - Synechocystis sp.,M.tub. - Mycobacterium 

tuberculosis, T.th. - Thermus thermophilus. 
5 FIGURES 16A-C show a nucleotide (Panels A-B, SEQ. ID. No. 86) 

and amino acid (Panel C, SEQ. ID. No. 87) sequence of the dnaE gene encoding the a 

subunit of DN A polymerase III replication enzyme. 

FIGURE 1 7 shows an alignment of the amino acid sequence of s 

subunits encoded by dnaQ of several organisms. The amino acid sequence of the 
10 Thermus thermophilus s subunit of dnaQ is also shown. T.th., Thermus thermophilus 

(SEQ. ID. No. 88); D.rad., Deinococcus radiodurans (SEQ. ID. No. 89); Bac.sub., 

Bacillus subtilis (SEQ. ID. No. 90); H.inf., Haemophilus influenzae (SEQ. ID. 

No. 91); E.c, Escherichia coli (SEQ. ID. No. 92); H.pyl, Helicobacter pylori (SEQ. 

ID. No. 93). The regions used to obtain the inner part of the dnaQ gene are shown in 
15 bold. The starts used for expression of the T.th. s subunit are marked. 

FIGURES 18A-B show the nucleotide (Panel A, SEQ. ID. No. 94) and 

amino acid (Panel B, SEQ. ID. No. 95) sequence of the dnaQ gene encoding the s 

subunit of DNA polymerase III replication enzyme. 

FIGURES 19A-B show an alignment of the DnaA protein of several 
20 organisms. The amino acid sequence of the Thermus thermophilus DnaA protein is 

also shown. P. mar., Pseudomonas marcesans (SEQ. ID. No. 96); Syn.sp., 

Synechocystis sp. (SEQ. ID. No. 97); Bac.sub., Bacillus subtilis (SEQ. ID. No. 98); M. 

tub; Mycobacterium tuberculosis (SEQ. ID. No. 99); T.th., Thermus thermophilus 

(SEQ. ID. No. 100); E.coli., Escherichia coli (SEQ. ID. No. 101); T. mar., 
25 Thermatoga maritima (SEQ. ID. No. 1 02); and H.pyl. , Helicobacter pylori (SEQ. ID. 

No. 103). 

FIGURES 20A-B show the nucleotide (Panel A, SEQ. ID. No. 104) 
and amino acid (Panel B, SEQ. ID. No. 105) sequence of the dnaA gene of Thermus 
thermophilus. 

30 FIGURES 21 A-B show the nucleotide (Panel A, SEQ. ID. No. 106) 

and amino acid (Panel B, SEQ. ID. No. 107) sequence of the dnaN gene encoding the 
B subunit of DNA polymerase III replication enzyme. 
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FIGURES 22A-B show an alignment of the B subunit of T.th. to the B 
subunits of other organisms. T.th, Thermus thermophilics (SEQ. ID. No. 108); E. 
coli, Escherichia coli (SEQ. ID. No. 109); P. mirab, Proteus mirabilis (SEQ. ID. 
No. 1 10); H. infl, Haemophilus influenzae (SEQ. ID. No. 1 1 1); P. put, Pseudomonas 
5 putida (SEQ. ID. No. 112); and B. cap., Buchnera aphidicola (SEQ. ID. No. 113). 

FIGURE 23 is a map of the pET24:dnaN plasmid. The functional 
regions of the plasmid are indicated by arrows and italic, restriction sites are marked 
with bars and symbols. The hatched parts in the plasmid correspond to T.th. dnaN. 

FIGURES 24A-B show the induction of T.th. B in E. coli cells 
10 harboring the T.th. B expression vector. Panel A is the cell induction. The first lane 
shows molecular weight markers (MW). The second lane shows uninduced E. coli 
cells, and the third lane shows induced E. coli. The induced T.th. B is indicated by the 
arrow shown to the left. Induced cells were lysed then treated with heat and the 
soluble portion was chromatographed on MonoQ. Panel B shows the results of 
1 5 MonoQ purification of T. th. 6. 

FIGURE 25 A is a schematic depiction of the use of the use of the 
enzymes of the present invention in accordance with an alternate embodiment hereof. 
In this scheme the clamp (B or PCNA) slides over the end of linear DN A to enhance 
the polymerase (Pol Ill-type such as Pol III, PolB or PolS.) In this fashion the clamp 
20 loader activity is not needed. 

FIGURE 25B graphically demonstrates the results of the practice of 
the alternate embodiment of the invention described and set forth in Example 1 5, 
infra. Lane 1, E. coli Pol III without B; Lane 2, E. coli with B; Lane 3, human PolS 
without PCNA; Lane 4, human Pol8 with PCNA; Lane 5, T.th. Pol III without T.th. B; 
25 Lane 6, T.th. Pol III with T.th. B. The respective pmol synthesis in lanes 1-6 are: 6, 
35,2, 24, 0.6 and 1.9. 

FIGURES 26A-B show the use of T.th. Pol III in extending singly 
primed M13mpl8 to an RFII form. The scheme in Fig. 26A shows the primed 
template in which a DNA 57mer was annealled to the M13mpl8 ssDNA circle. Then 
30 T.th. B subunit (produced recombinantly) and T.th. Pol III were added to the DNA in 
the presence of radioactive nucleoside triphosphates. In Fig. 26B, the products of the 
reaction were analyzed in a 0.8% native agarose gel. The position of ssDNA starting 
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material, the RFII product, and of intermediate species, are shown to the sides of the 
gel. Lane 1, use of Pol III. Lane 2, use of the non-Pol III DNA polymerase. 

FIGURE 27 is an SDS polyacrylamide gel of the proteins of the A. 
aeolicus replication machinery. 
5 FIGURE 28 is an SDS polyacrylamide gel analysis of the MonoQ 

fractions of the method used to reconstitute and purify the A. aeolicus x55' complex. 

FIGURE 29 is an SDS polyacrylamide gel analysis of the gel filtration 
column fractions used in the preparation of the A. aeolicus ax85' complex. The 
bottom gel analysis shows the profile obtained using the A. aeolicus a subunit 
10 (polymerase) in the absence of the other subunits. 

FIGURE 30 is an alkaline agarose gel analysis of reaction products for 
extension of a single primer around a 7.2 kb M13mpl8 circular ssDNA genome that 
has been coated with A. aeolicus SSB. The time course on the left are produced by 
ax88'/p, and the time course on the right is produced by ccxS8' in the absence of p. 
1 5 FIGURE 3 1 is a graph illustrating the optimal temperature for activity 

of the alpha subunit of Thermus replicase using a calf thymus DNA replication assay. 
Reactions were shifted to the indicated temperature for 5 minutes before detecting the 
level of DNA synthesis activity. 

FIGURE 32 is a graph illustrating the optimal temperature for activity 
20 of the alpha subunit of the Aquifex replicase using a calf thymus DNA replication 
assay. Reactions were shifted to the indicated temperature for 5 minutes before 
detecting the level of DNA synthesis activity. 

FIGURES 33A-E illustrate the heat stability of Aquifex components. 
Assays of either a (Fig. 33A), p (Fig. 33B), x85' complex (Fig. 33C), SSB (Fig. 33D) 
25 and ax88' complex (Fig. 33E) were performed after heating samples at the indicated 
temperatures. Components were heated in buffer containing the following: 0.1% 
Triton X-100 (filled diamonds); 0.05% Tween-20 and 0.01% NP-40 (filled circles); 4 
mM CaCl 2 (filled triangles); 40% Glycerol (inverted filled triangles); 0.01% Triton X- 
100, 0.05% Tween-20, 0.01% NP-40, 4 mM CaCl 2 (half-filled square); 40% Glycerol, 
30 0.1% Triton X-100 (open diamonds); 40% Glycerol, 0.05% Tween-20, 0.01% NP-40 
(open circles); 40% Glycerol, 4 mM CaCl 2 (open triangles); 40% Glycerol, 0.01% 
Triton X-100, 0.05% Tween-20, 0.01% NP-40, 4 mM CaCl 2 (half-filled diamonds). 
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FIGURES 34A-B show the nucleotide sequence (SEQ. ID. No. 1 17) of 
the dnaE gene of A. aeolicus. 

FIGURE 35 shows the amino acid sequence (SEQ. ID. No. 1 18) of the 
a subunit of A aeolicus. 
5 FIGURE 36 shows the nucleotide sequence (SEQ. ID. No. 1 19) of the 

dnaX gene of A. aeolicus. 

FIGURE 37 shows the amino acid sequence (SEQ. ID. No. 120) of the 
tau subunit of A. aeolicus. 

FIGURE 38 shows the nucleotide sequence (SEQ. ID. No. 121) of the 
10 dnaN gene of A. aeolicus. 

FIGURE 39 shows the amino acid sequence (SEQ. ID. No. 122) of the 
P subunit of A. aeolicus. 

FIGURE 40 shows the partial nucleotide sequence (SEQ. ID. No. 123) 
of the holA gene of A. aeolicus. 
15 FIGURE 41 shows the partial amino acide sequence (SEQ. ID. 

No. 124) of the § subunit of A. aeolicus. 

FIGURE 42 shows the nucleotide sequence (SEQ. ID. No. 125) of the 
holB gene of A. aeolicus. 

FIGURE 43 shows the amino acid sequence (SEQ. ID. No. 126) of the 
20 5' subunit of A. aeolicus. 

FIGURE 44 shows the nucleotide sequence (SEQ. ID. No. 127) of the 
dnaQ of A. aeolicus. 

FIGURE 45 shows the amino acid sequence (SEQ. ID. No. 128) of the 
s subunit of A. aeolicus. 
25 FIGURE 46 shows the nucleotide sequence (SEQ. ID. No. 1 29) of the 

ssb gene of A. aeolicus. 

FIGURE 47 shows the amino acid sequence (SEQ. ID. No. 130) of the 
single-strand binding protein of A. aeolicus. 

FIGURE 48 shows the nucleotide sequence (SEQ. ID. No. 131) of the 
30 dnaB gene of A. aeolicus. 

FIGURE 49 shows the amino acid sequence (SEQ. ID. No. 132) of the 
DnaB helicase of A aeolicus. 



FIGURE 50 shows the nucleotide sequence (SEQ. ID. No. 133) of the 
dnaG gene of A. aeolicus. 

FIGURE 51 shows the amino acid sequence (SEQ. ID. No. 134) of the 
DnaG primase of A aeolicus. 
5 FIGURE 52 shows the nucleotide sequence (SEQ. ID. No. 1 35) of the 

dnaC gene of A. aeolicus. 

FIGURE 53 shows the amino acid sequence (SEQ. ID. No. 136) of the 
DnaC protein of A aeolicus. 

FIGURE 54A-B shows the nucleotide sequence (SEQ. ID. No. 137) of 
1 0 the dnaE gene of T. maritima. 

FIGURE 55 shows the amino acid sequence (SEQ. ID. No. 138) of the 
a subunit of T. maritima. 

FIGURE 56 shows the nucleotide sequence (SEQ. ID. No. 139) of the 
dnaQ gene of T. maritima. 
15 FIGURE 57 shows the amino acid sequence (SEQ. ID. No. 140) of the 

s subunit of T. maritima. 

FIGURE 58 shows the nucleotide sequence (SEQ. ID. No. 141) of the 
dnaXgQ.no, ofT. maritima. 

FIGURE 59 shows the amino acid sequence (SEQ. ID. No. 142) of the 
20 tau subunit of T. maritima. 

FIGURE 60 shows the nucleotide sequence (SEQ. ID. No. 143) of the 
dnaN gene of T. maritima. 

FIGURE 61 shows the amino acid sequence (SEQ. ID. No. 144) of the 
P subunit of T. maritima. 
25 FIGURE 62 shows the nucleotide sequence (SEQ. ID. No. 145) of the 

hoi A gene of T. maritima. 

FIGURE 63 shows the amino acid sequence (SEQ. ID. No. 146) of the 
5 subunit of T. maritima. 

FIGURE 64 shows the nucleotide sequence (SEQ. ID. No. 147) of the 
30 holB gene of T. maritima. 

FIGURE 65 shows the amino acid sequence (SEQ. ID. No. 148) of the 
5' subunit of T. maritima. 
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FIGURE 66 shows the nucleotide sequence (SEQ. ID. No. 149) of the 
ssb gene of T. maritima. 

FIGURE 67 shows the amino acid sequence (SEQ. ID. No. 150) of the 
single-strand binding protein of T. maritima. 

FIGURE 68 shows the nucleotide sequence (SEQ. ID. No. 151) of the 
dnaB gene of T. maritima. 

FIGURE 69 shows the amino acid sequence (SEQ. ID. No. 152) of the 
DnaB helicase of T. maritima. 

FIGURE 70 shows the nucleotide sequence (SEQ. ID. No. 153) of the 
dnaG gene of T. maritima. 

FIGURE 71 shows the amino acid sequence (SEQ. ID. No. 154) of the 
DnaG primase of T. maritima. 

FIGURE 72 shows the nucleotide sequence (SEQ. ID. No. 155) of the 
holB gene of T. thermophilus. 

FIGURE 73 shows the amino acid sequence (SEQ. ID. No. 156) of the 
5' subunit of T. thermophilus. 

FIGURE 74 shows the nucleotide sequence (SEQ. ID. No. 157) of the 
holA gene of T. thermophilus. 

FIGURE 75 shows the amino acid sequence (SEQ. ID. No. 158) of the 
5 subunit of T. thermophilus. 

FIGURE 76 shows the nucleotide sequence (SEQ. ID. No. 171) of the 
ssb gene of T. thermophilus. 

FIGURE 77 shows the amino acid sequence (SEQ. ID. No. 172) of the 
single-strand binding protein of T. thermophilus. 

FIGURE 78 shows the partial nucleotide sequence (SEQ. ID. No. 173) 
of the dnaN gene of B. stearothermophilus. 

FIGURE 79 shows the partial amino acid sequence (SEQ. ID. No. 174) 
of the p subunit of B. stearothermophilus. 

FIGURE 80 shows the nucleotide sequence (SEQ. ID. No. 175) of the 
ssb gene of B. stearothermophilus. 

FIGURE 81 shows the amino acid sequence (SEQ. ID. No. 176) of the 
single-strand binding protein of B. stearothermophilus. 
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FIGURE 82 shows the nucleotide sequence (SEQ. ID. No. 177) of the 
holA gene of B. stearothermophilus. 

FIGURE 83 shows the amino acid sequence (SEQ. ID. No. 178) of the 
5 subunit of B. stearothermophilus. 

FIGURE 84 shows the nucleotide sequence (SEQ. ID. No. 179) of the 
holB gene of B. stearothermophilus. 

FIGURE 85 shows the amino acid sequence (SEQ. ID. No. 180) of the 
8' subunit of B. stearothermophilus. 

FIGURES 86A-B show the partial nucleotide sequence (SEQ. ID. 
No. 181) of the dnaX gene of B. stearothermophilus. 

FIGURE 87 shows the partial amino acid sequence (SEQ. ID. No. 182) 
of the tau subunit of B. stearothermophilus. 

FIGURES 88A-B show the nucleotide sequence (SEQ. ID. No. 183) of 
the polC gene of B. stearothermophilus. 

FIGURE 89 shows the amino acid sequence (SEQ. ID. No. 184) of the 
PolC or a-large subunit of B. stearothermophilus. 

DETAILED DESCRIPTION OF THE INVENTION 

In accordance with the present invention there may be employed 
conventional molecular biology, microbiology, and recombinant DNA techniques 
within the skill of the art. Such techniques are explained fully in the literature. See, 
e.g., Sambrook et al., "Molecular Cloning: A Laboratory Manual" (1989); "Current 
Protocols in Molecular Biology" Volumes I-III (Ausubel, R. M., ed.) (1994); "Cell 
Biology: A Laboratory Handbook" Volumes I-III (Celis, J.E., ed.) (1994); "Current 
Protocols in Immunology" Volumes I-III (Coligan, J.E., ed.) (1994); "Oligonucleotide 
Synthesis" (M.J. Gait, ed.) (1984); "Nucleic Acid Hybridization" (B.D. Hames & 
S.J. Higgins, eds.) (1985); "Transcription And Translation" (B.D. Hames & S.J. 
Higgins, eds.) (1984); "Animal Cell Culture" (R.I. Freshney, ed.) (1986); 
"Immobilized Cells And Enzymes" (IRL Press) (1986); B. Perbal, "A Practical Guide 
To Molecular Cloning" (1984), each of which is hereby incorporated by reference. 

Therefore, if appearing herein, the following terms shall have the 
definitions set out below. 
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The terms "DNA Polymerase III," "Polymerase Ill-type enzyme(s)", 
"Polymerase III enzyme complex(s)", "T.th. DNA Polymerase III", "A.ae. DNA 
Polymerase III", "T. /war. DNA Polymerase III", and any variants not specifically listed, 
may be used herein interchangeably, as are (3 subunit and sliding clamp and clamp as 
are also y complex, clamp loader, and RFC, as used throughout the present application 
and claims refer to proteinaceous material including single or multiple proteins, and 
extends to those proteins having the amino acid sequence data described herein and 
presented in the Figures and corresponding Sequence Listing entries, and the 
corresponding profile of activities set forth herein and in the Claims. Accordingly, 
proteins displaying substantially equivalent or altered activity are likewise 
contemplated. These modifications may be deliberate, for example, such as 
modifications obtained through site-directed mutagenesis, or may be accidental, such 
as those obtained through mutations in hosts that are producers of the complex or its 
named subunits. Also, the terms "DNA Polymerase III," "T.th. DNA Polymerase III," 
and "y and x subunits", "B subunit", "a subunit", "e subunit", "8 subunit", "8 T 
subunit", "SSB protein", "sliding clamp" and "clamp loader" are intended to include 
within their scope proteins specifically recited herein as well as all substantially 
homologous analogs and allelic variations. As used herein y complex refers to a 
particular type of clamp loader that includes a y subunit. 

Also as used herein, the term "thermolabile enzyme" refers to a DNA 
polymerase which is not resistant to inactivation by heat. For example, T5 DNA 
polymerase, the activity of which is totally inactivated by exposing the enzyme to a 
temperature of 90°C for 30 seconds, is considered to be a thermolabile DNA 
polymerase. As used herein, a thermolabile DNA polymerase is less resistant to heat 
inactivation than in a thermostable DNA polymerase. A thermolabile DNA 
polymerase typically will also have a lower optimum temperature than a thermostable 
DNA polymerase. Thermolabile DNA polymerases are typically isolated from 
mesophilic organisms, for example mesophilic bacteria or eukaryotes, including 
certain animals. 

As used herein, the term "thermostable enzyme" refers to an enzyme 
which is stable to heat and is heat resistant and catalyzes (facilitates) combination of 
the nucleotides in the proper manner to form the primer extension products that are 
complementary to each nucleic acid strand. Generally, the synthesis will be initiated 
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at the 3' end of each primer and will proceed in the 5' direction along the template 
strand, until synthesis terminates, producing molecules of different lengths. 

The thermostable enzyme herein must satisfy a single criterion to be 
effective for the amplification reaction, i.e., the enzyme must not become irreversibly 
5 denatured (inactivated) when subjected to the elevated temperatures for the time 
necessary to effect denaturation of double-stranded nucleic acids. Irreversible 
denaturation for purposes herein refers to permanent and complete loss of enzymatic 
activity. The heating conditions necessary for denaturation will depend, e.g., on the 
buffer salt concentration and the length and nucleotide composition of the nucleic 
1 0 acids being denatured, but typically range from about 90C° to about 96°C for a time 
depending mainly on the temperature and the nucleic acid length, typically about 0.5 
to four minutes. Higher temperatures may be tolerated as the buffer salt concentration 
and/or GC composition of the nucleic acid is increased. Preferably, the enzyme will 
not become irreversibly denatured at about 90°-100°C. 
1 5 The thermostable enzymes herein preferably have an optimum 

temperature at which they function that is higher than about 40°C, which is the 
temperature below which hybridization of primer to template is promoted, although, 
depending on (1) magnesium and salt concentrations and (2) composition and length 
of primer, hybridization can occur at higher temperature (e.g., 45°-70°C). The higher 
20 the temperature optimum for the enzyme, the greater the specificity and/or selectivity 
of the primer-directed extension process. However, enzymes that are active below 
40°C, e.g., at 37°C, are also within the scope of this invention provided they are heat- 
stable. Preferably, the optimum temperature ranges from about 50° to about 90°C, 
more preferably about 60° to about 80°C. In this connection, the term "elevated 
25 temperature" as used herein is intended to cover sustained temperatures of operation 
of the enzyme that are equal to or higher than about 60°C. 

The term "template" as used herein refers to a double-stranded or 
single-stranded DNA molecule which is to be amplified, synthesized, or sequenced. 
In the case of a double-stranded DNA molecule, denaturation of its strands to form a 
30 first and a second strand is performed before these molecules may be amplified, 

synthesized or sequenced. A primer, complementary to a portion of a DNA template 
is hybridized under appropriate conditions and the DNA polymerase of the invention 
may then synthesize a DNA molecule complementary to said template or a portion 
thereof. The newly synthesized DNA molecule, according to the invention, may be 
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equal or shorter in length than the original DNA template. Mismatch incorporation 
during the synthesis or extension of the newly synthesized DNA molecule may result 
in one or a number of mismatched base pairs. Thus, the synthesized DNA molecule 
need not be exactly complementary to the DNA template. 

The term "incorporating" as used herein means becoming a part of a 
DNA molecule or primer. 

As used herein "amplification" refers to any in vitro method for 
increasing the number of copies of a nucleotide sequence, or its complimentary 
sequence, with the use of a DNA polymerase. Nucleic acid amplification results in 
the incorporation of nucleotides into a DNA molecule or primer thereby forming a 
new DNA molecule complementary to a DNA template. The formed DNA molecule 
and its template can be used as templates to synthesize additional DNA molecules. 
As used herein, one amplification reaction may consist of many rounds of DNA 
replication. DNA amplification reactions include, for example, polymerase chain 
reactions (PCR). One PCR reaction may consist of about 20 to 100 "cycles" of 
denaturation and synthesis of a DNA molecule. In this connection, the use of the term 
"long stretches of DNA" as it refers to the extension of primer along DNA is intended 
to cover such extensions of an average length exceeding 7 kilobases. Naturally, such 
length will vary, and all such variations are considered to be included within the scope 
of the invention. 

As used herein, the term "holoenzyme" refers to a multi-subunit DNA 
polymerase activity comprising and resulting from various subunits which each may 
have distinct activities but which when contained in an enzyme reaction operate to 
carry out the function of the polymerase (typically DNA synthesis) and enhance its 
activity over use of the DNA polymerase subunit alone. For example, E. coli DNA 
polymerase III is a holoenzyme comprising three components of one or more subunits 
each: (1) a core component consisting of a heterotrimer of a, s and 0 subunits; (2) a (3 
component consisting of a [3 subunit dimer; and (3) a y complex component consisting 
of a heteropentamer of y, 5, 5', % and v|/ subunits (see Studwell and O'Donnell, 1990). 
These three components, and the various subunits of which they consist, are linked 
non-covalently to form the DNA polymerase III holoenzyme complex. However, 
they also function when not linked in solution. 
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As used herein, "enzyme complex" refers to a protein structure 



consisting essentially of two or more subunits of a replication enzyme, which may or 
may not be identical, noncovalently linked to each other to form a multi-subunit 
structure. An enzyme complex according to this definition ideally will have a 
5 particular enzymatic activity, up to and including the activity of the replication 
enzyme. For example, a "DNA pol III enzyme complex" as used herein means a 
multi-subunit protein activity comprising two or more of the subunits of the DNA pol 
III replication enzyme as defined above, and having DNA polymerizing or 
synthesizing activity. Thus, this term encompasses the native replication enzyme, as 
1 0 well as an enzyme complex lacking one or more of the subunits of the replication 
enzyme (e.g., DNA pol III exo-, which lacks the s subunit). 



isomeric form. However, residues in the "D" isomeric form can be substituted for any 
L-amino acid residue, as long as the desired functional property of immunoglobulin- 
15 binding is retained by the polypeptide. NH 2 refers to the free amino group present at 
the amino terminus of a polypeptide. COOH refers to the free carboxy group present 
at the carboxy terminus of a polypeptide. In keeping with standard polypeptide 
nomenclature, J. Biol Chem., 243:3552-59 (1969), abbreviations for amino acid 
residues are shown in the following Table of Correspondence: 



The amino acid residues described herein are preferred to be in the "L" 



20 



TABLE OF CORRESPONDENCE 



SYMBOLS 



AMINO ACID 



1 -Letter 



3 -Letter 



Y 
G 
F 
M 
A 
S 



E 
W 



L 
T 
V 
P 
K 
H 
Q 



Tyr 
Gly 
Phe 
Met 
Ala 
Ser 
He 
Leu 
Thr 
Val 
Pro 
Lys 
His 
Gin 
Glu 



tyrosine 
glycine 



phenylalanine 
methionine 



alanine 
serine 



isoleucine 

leucine 

threonine 



valine 
proline 
lysine 
histidine 



glutamine 
glutamic acid 
tryptophan 



Trp 
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R 
D 

N 
C 



Arg 
Asp 
Asn 



arginine 
aspartic acid 
asparagine 
cysteine 



Cys 



It should be noted that all amino-acid residue sequences are represented herein by 
formulae whose left and right orientation is in the conventional direction of amino- 
terminus to carboxy-terminus. Furthermore, it should be noted that a dash at the 
5 beginning or end of an amino acid residue sequence indicates a peptide bond to a 

further sequence of one or more amino-acid residues. The above Table is presented to 
correlate the three-letter and one-letter notations which may appear alternately herein. 

A "replicon" is any genetic element (e.g., plasmid, chromosome, virus) 
that functions as an autonomous unit of DNA replication in vivo; i.e., capable of 
1 0 replication under its own control. 

A "vector" is a replicon, such as plasmid, phage or cosmid, to which 
another DNA segment may be attached so as to bring about the replication of the 
attached segment. 

A "DNA molecule" refers to the polymeric form of 
1 5 deoxyribonucleotides (adenine, guanine, thymine, or cytosine) in its either single 
stranded form, or a double-stranded helix. This term refers only to the primary and 
secondary structure of the molecule, and does not limit it to any particular tertiary 
forms. Thus, this term includes double-stranded DNA found, inter alia, in linear 
DNA molecules (e.g., restriction fragments), viruses, plasmids, and chromosomes. In 
20 discussing the structure of particular double-stranded DNA molecules, sequences may 
be described herein according to the normal convention of giving only the sequence in 
the 5' to 3' direction along the nontranscribed strand of DNA (i.e., the strand having a 
sequence homologous to the mRNA). 

An "origin of replication" refers to those DNA sequences that 
25 participate in DNA synthesis. 

A DNA "coding sequence" is a double-stranded DNA sequence which 
is transcribed and translated into a polypeptide in vivo when placed under the control 
of appropriate regulatory sequences. The boundaries of the coding sequence are 
determined by a start codon at the 5' (amino) terminus and a translation stop codon at 
30 the 3' (carboxyl) terminus. A coding sequence can include, but is not limited to, 
prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA sequences 
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from eukaryotic (e.g., mammalian) DNA, and even synthetic DNA sequences. A 
polyadenylation signal and transcription termination sequence will usually be located 
3' to the coding sequence. 

Transcriptional and translational control sequences are DNA 
regulatory sequences, such as promoters, enhancers, polyadenylation signals, 
terminators, and the like, that provide for the expression of a coding sequence in a 
host cell. 

A "promoter sequence" is a DNA regulatory region capable of binding 
RNA polymerase in a cell and initiating transcription of a downstream (3' direction) 
coding sequence. For purposes of defining the present invention, the promoter 
sequence is bounded at its 3' terminus by the transcription initiation site and extends 
upstream (5' direction) to include the minimum number of bases or elements 
necessary to initiate transcription at levels detectable above background. Within the 
promoter sequence will be found a transcription initiation site (conveniently defined 
by mapping with nuclease SI), as well as protein binding domains (consensus 
sequences) responsible for the binding of RNA polymerase. Eukaryotic promoters 
will often, but not always, contain "TATA" boxes and "CAT" boxes. Prokaryotic 
promoters contain Shine-Dalgarno sequences in addition to the -10 and -35 consensus 
sequences. 

An "expression control sequence" is a DNA sequence that controls and 
regulates the transcription and translation of another DNA sequence. A coding 
sequence is "under the control" of transcriptional and translational control sequences 
in a cell when RNA polymerase transcribes the coding sequence into mRNA, which is 
then translated into the protein encoded by the coding sequence. 

A "signal sequence" can be included before the coding sequence. This 
sequence encodes a signal peptide, N-terminal to the polypeptide, that communicates 
to the host cell to direct the polypeptide to the cell surface or secrete the polypeptide 
into the media, and this signal peptide is clipped off by the host cell before the protein 
leaves the cell. Signal sequences can be found associated with a variety of proteins 
native to prokaryotes and eukaryotes. 

The term "oligonucleotide," as used generally herein, such as in 
referring to probes prepared and used in the present invention, is defined as a 
molecule comprised of two or more (deoxy Ribonucleotides, preferably more than 
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three. Its exact size will depend upon many factors which, in turn, depend upon the 
ultimate function and use of the oligonucleotide. 

The term "primer" as used herein refers to an oligonucleotide, whether 
occurring naturally as in a purified restriction digest or produced synthetically, which 
is capable of acting as a point of initiation of synthesis when placed under conditions 
in which synthesis of a primer extension product, which is complementary to a 
nucleic acid strand, is induced, i.e., in the presence of nucleotides and an inducing 
agent such as a DNA polymerase and at a suitable temperature and pH. The primer 
may be either single-stranded or double-stranded and must be sufficiently long to 
prime the synthesis of the desired extension product in the presence of the inducing 
agent. The exact length of the primer will depend upon many factors, including 
temperature, source of primer and use of the method. For example, for diagnostic 
applications, depending on the complexity of the target sequence, the oligonucleotide 
primer typically contains 15-25 or more nucleotides, although it may contain fewer 
nucleotides. 

The primers herein are selected to be "substantially" complementary to 
different strands of a particular target DNA sequence. This means that the primers 
must be sufficiently complementary to hybridize with their respective strands. 
Therefore, the primer sequence need not reflect the exact sequence of the template. 
For example, a non-complementary nucleotide fragment may be attached to the 5' end 
of the primer, with the remainder of the primer sequence being complementary to the 
strand. Alternatively, non-complementary bases or longer sequences can be 
interspersed into the primer, provided that the primer sequence has sufficient 
complementarity with the sequence of the strand to hybridize therewith and thereby 
form the template for the synthesis of the extension product. 

As used herein, the terms "restriction endonucleases" and "restriction 
enzymes" refer to bacterial enzymes, each of which cut double-stranded DNA at or 
near a specific nucleotide sequence. 

A cell has been "transformed" by exogenous or heterologous DNA 
when such DNA has been introduced inside the cell. The transforming DNA may or 
may not be integrated (covalently linked) into chromosomal DNA making up the 
genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the 
transforming DNA may be maintained on an episomal element such as a plasmid. 
With respect to eukaryotic cells, a stably transformed cell is one in which the 
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transforming DNA has become integrated into a chromosome so that it is inherited by 
daughter cells through chromosome replication. This stability is demonstrated by the 
ability of the eukaryotic cell to establish cell lines or clones comprised of a population 
of daughter cells containing the transforming DNA. A "clone" is a population of cells 
derived from a single cell or common ancestor by mitosis. A "cell line" is a clone of 
a primary cell that is capable of stable growth in vitro for many generations. 

Two DNA sequences are "substantially homologous" when at least 
about 75% (preferably at least about 80%, and most preferably at least about 90 or 
95%) of the nucleotides match over the defined length of the DNA sequences. 
Sequences that are substantially homologous can be identified by comparing the 
sequences using standard software available in sequence data banks, or in a Southern 
hybridization experiment under, for example, stringent conditions as defined for that 
particular system. Suitable conditions include those characterized by a hybridization 
buffer comprising 0.9M sodium citrate ("SSC") buffer at a temperature of about 37°C 
and washing in SSC buffer at a temperature of about 37°C; and preferably in a 
hybridization buffer comprising 20% formamide in 0.9M SSC buffer at a temperature 
of about 42°C and washing with 0.2x SSC buffer at about 42°C. Stringency 
conditions can be further varied by modifying the temperature and/or salt content of 
the buffer, or by modifying the length of the hybridization probe as is known to those 
of skill in the art. Defining appropriate hybridization conditions is within the skill of 
the art. See, e.g., Maniatis et al., 1982; Glover, 1985; Hames and Higgins, 1984. 

It should be appreciated that also within the scope of the present 
invention are degenerate DNA sequences. By "degenerate" is meant that a different 
three-letter codon is used to specify a particular amino acid. It is well known in the 
art that the following codons can be used interchangeably to code for each specific 
amino acid: 



Phenylalanine (Phe or F) UUU or UUC 



Leucine (Leu or L) 
Isoleucine (He or I) 
Methionine (Met or M) 
Valine (Val or V) 
Serine (Ser or S) 



UUA or UUG or CUU or CUC or CUA or CUG 

AUU or AUC or AUA 

AUG 

GUU or GUC of GUA or GUG 

UCU or UCC or UCA or UCG or AGU or AGC 
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Proline (Pro or P) 


CCU or CCC or CCA or CCG 


Threonine (Thr or T) 


ACU or ACC or ACA or ACG 


Alanine (Ala or A) 


GCU or GCG or GCA or GCG 


Tyrosine (Tyr or Y) 


UAU or UAC 


Histidine (His or H) 


CAU or CAC 


Glutamine (Gin or Q) 


CAA or CAG 


Asparagine (Asn or N) 


AAU or AAC 


Lysine (Lys or K) 


AAA or AAG 


Aspartic Acid (Asp or D) 


GAU or GAC 


Glutamic Acid (Glu or E) 


GAA or GAG 


Cysteine (Cys or C) 


UGU or UGC 


Arginine (Arg or R) 


CGU or CGC or CGA or CGG or AGA or AGG 


Glycine (Gly or G) 


GGU or GGC or GGA or GGG 


Tryptophan (Trp or W) 


UGG 


Termination codon 


UAA (ochre) or UAG (amber) or UGA (opal) 



It should be understood that the codons specified above are for RNA sequences. The 
corresponding codons for DNA have a T substituted for U. 

Mutations can be made, e.g., in SEQ. ID. No. 1, or any of the nucleic 

20 acids set forth herein, such that a particular codon is changed to a codon which codes 
for a different amino acid. Such a mutation is generally made by making the fewest 
nucleotide changes possible. A substitution mutation of this sort can be made to 
change an amino acid in the resulting protein in a non-conservative manner (i.e., by 
changing the codon from an amino acid belonging to a grouping of amino acids 

25 having a particular size or characteristic to an amino acid belonging to another 

grouping) or in a conservative manner (i.e., by changing the codon from an amino 
acid belonging to a grouping of amino acids having a particular size or characteristic 
to an amino acid belonging to the same grouping). Such a conservative change 
generally leads to less change in the structure and function of the resulting protein. A 

30 non-conservative change is more likely to alter the structure, activity or function of 
the resulting protein. The present invention should be considered to include 
sequences containing conservative changes which do not significantly alter the 
activity or binding characteristics of the resulting protein. 
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The following is one example of various groupings of amino acids: 

Amino acids with nonpolar R groups 

Alanine 

Valine 
5 Leucine 

Isoleucine 

Proline 

Phenylalanine 

Tryptophan 
10 Methionine 

Amino acids with uncharged polar R groups 
Glycine 
Serine 
1 5 Threonine 
Cysteine 
Tyrosine 
Asparagine 
Glutamine 

20 

Amino acids with charged polar R groups (negatively charged at pH 6.0) 
Aspartic acid 
Glutamic acid 

25 Basic amino acids (positively charged at pH 6.0) 
Lysine 
Arginine 

Histidine (at pH 6.0) 

30 Amino acids with phenyl groups : 
Phenylalanine 
Tryptophan 
Tyrosine 
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Another grouping may 


be according to molecular weight (i.e., size of R groups): 


Glycine 


75 


Alanine 


89 


Serine 


105 


Proline 


115 


Valine 


117 


Threonine 


119 


Cysteine 


121 


Leucine 


131 


Isoleucine 


131 


Asparagine 


132 


Aspartic acid 


133 


Glutamine 


146 


Lysine 


146 


Glutamic acid 


147 


Methionine 


149 


Histidine (at pH 6.0) 


155 


Phenylalanine 


165 


Arginine 


174 


Tyrosine 


181 


Tryptophan 


204 



Particularly preferred substitutions are: 

- Lys for Arg and vice versa such that a positive charge may be maintained; 
25 - Glu for Asp and vice versa such that a negative charge may be maintained; 

- Ser for Thr such that a free -OH can be maintained; and 

- Gin for Asn such that a free NH 2 can be maintained. 

Amino acid substitutions may also be introduced to substitute an 
amino acid with a particularly preferable property. For example, a Cys may be 
30 introduced into a potential site for disulfide bridges with another Cys. A His may be 
introduced as a particularly "catalytic" site (i.e., His can act as an acid or base and is 
the most common amino acid in biochemical catalysis). Pro may be introduced 
because of its particularly planar structure, which induces p-turns in the protein's 
structure. 
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Two amino acid sequences are "substantially homologous" when at 
least about 70% of the amino acid residues (preferably at least about 80%, and most 
preferably at least about 90 or 95%) are identical, or represent conservative 
substitutions. 

A "heterologous" region of the DNA construct is an identifiable 
segment of DNA within a larger DNA molecule that is not found in association with 
the larger molecule in nature. Thus, when the heterologous region encodes a 
mammalian gene, the gene will usually be flanked by DNA that does not flank the 
mammalian genomic DNA in the genome of the source organism. Another example 
of a heterologous coding sequence is a construct where the coding sequence itself is 
not found in nature (e.g., a cDNA where the genomic coding sequence contains 
introns, or synthetic sequences having codons different than the native gene). Allelic 
variations or naturally-occurring mutational events do not give rise to a heterologous 
region of DNA as defined herein. 

An "antibody" is any immunoglobulin, including antibodies and 
fragments thereof, that binds a specific epitope. The term encompasses polyclonal, 
monoclonal, and chimeric antibodies, the last mentioned described in further detail in 
U.S. Patent Nos. 4,816,397 to Boss et al. and 4,816,567 to Cabilly et al. 

An "antibody combining site" is that structural portion of an antibody 
molecule comprised of heavy and light chain variable and hypervariable regions that 
specifically binds antigen. 

The phrase "antibody molecule" in its various grammatical forms as 
used herein contemplates both an intact immunoglobulin molecule and an 
immunologically active portion of an immunoglobulin molecule. Exemplary 
antibody molecules are intact immunoglobulin molecules, substantially intact 
immunoglobulin molecules and those portions of an immunoglobulin molecule that 
contains the paratope, including those portions known in the art as Fab, Fab', F(ab')2 
and F(v), which portions are preferred for use in the therapeutic methods described 
herein. Fab and F(ab') 2 portions of antibody molecules are prepared by the proteolytic 
reaction of papain and pepsin, respectively, on substantially intact antibody molecules 
by methods that are well-known. See for example, U.S. Patent No. 4,342,566 to 
Theofilopolous et al. Fab' antibody molecule portions are also well-known and are 
produced from F(ab') 2 portions followed by reduction of the disulfide bonds linking 
the two heavy chain portions as with mercaptoethanol, and followed by alkylation of 
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the resulting protein mercaptan with a reagent such as iodoacetamide. An antibody 
containing intact antibody molecules is preferred herein. 

The phrase "monoclonal antibody" in its various grammatical forms 
refers to an antibody having only one species of antibody combining site capable of 
5 immunoreacting with a particular antigen. A monoclonal antibody thus typically 
displays a single binding affinity for any antigen with which it immunoreacts. A 
monoclonal antibody may therefore contain an antibody molecule having a plurality 
of antibody combining sites, each immunospecific for a different antigen; e.g., a 
bispecific (chimeric) monoclonal antibody. 
10 A DNA sequence is "operatively linked" to an expression control 

sequence when the expression control sequence controls and regulates the 
transcription and translation of that DNA sequence. The term "operatively linked" 
includes having an appropriate start signal (e.g., ATG) in front of the DNA sequence 
to be expressed and maintaining the correct reading frame to permit expression of the 
1 5 DNA sequence under the control of the expression control sequence and production of 
the desired product encoded by the DNA sequence. If a gene that one desires to insert 
into a recombinant DNA molecule does not contain an appropriate start signal, such a 
start signal can be inserted in front of the gene. 

The term "standard hybridization conditions" refers to salt and 
20 temperature conditions substantially equivalent to 5x SSC and 65°C for both 

hybridization and wash. However, one skilled in the art will appreciate that such 
"standard hybridization conditions" are dependent on particular conditions including 
the concentration of sodium and magnesium in the buffer, nucleotide sequence length 
and concentration, percent mismatch, percent formamide, and the like. Also 
25 important in the determination of "standard hybridization conditions" is whether the 
two sequences hybridizing are RNA-RNA, DNA-DNA or RNA-DNA. Such standard 
hybridization conditions are easily determined by one skilled in the art according to 
well known formulae, wherein hybridization is typically 10-20°C below the predicted 
or determined T m with washes of higher stringency, if desired. 
30 In its primary aspect, the present invention concerns the identification 

of a class of DNA Polymerase Ill-type enzymes or complexes found in thermophilic 
bacteria such as Thermus thermophilus (T.th), Aquifex aeolicus (A.ae.), Thermotoga 
maritima (T.ma), Bacillus stearothermophilus (B.st.) and other eubacteria which 
exhibit the following characteristics, among their properties: the ability to extend a - 
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primer over a long stretch of ssDNA at elevated temperature, stimulation by its 
cognate sliding clamp of the type that is assembled on DNA by a clamp loader, 
accessory subunits that exhibit DNA-stimulated ATPase activity at elevated 
temperature and/or ionic strength, and an associated 3-5' exonuclease activity. In a 
particular aspect, the invention extends to Polymerase Ill-type enzymes derived from 
abroad class of thermophilic eubacteria that include polymerases isolated from the 
thermophilic bacteria Aquifex aeolicus (A.ae. polymerase) and other members of the 
Aquifex genus; Thermus thermophilus (T.th. polymerase), Thermusfavus (Tfl/Tub 
polymerase), Thermus ruber (Tru polymerase), Thermus brockianus 
(DYNAZYME™ polymerase) and other members of the Thermus genus; Bacillus 
stearothermophilus (Bst polymerase) and other members of the Bacillus genus; 
Thermoplasma acidophilum (Tac polymerase) and other members of the 
Thermoplasma genus; and Thermotoga neapolitana (Tne polymerase; See WO 
96/10640 to Chatterjee et al.), Thermotoga maritima (Tma polymerase; See U.S. 
Patent No. 5,374,553 to Gelfand et al.), and other members of the Thermotoga genus. 
The particular polymerase discussed herein by way of illustration and not limitation, 
is the enzyme derived from T.th, A.ae., Tma., or B.st. 

Polymerase Ill-type enzymes covered by the invention include those 
that may be prepared by purification from cellular material, as described in detail in 
the Examples infra, as well as enzyme assemblies or complexes that comprise the 
combination of individually prepared enzyme subunits or components. Accordingly, 
the entire enzyme may be prepared by purification from cellular material, or may be 
constructed by the preparation of the individual components and their assembly into 
the functional enzyme. A representative and non-limitative protocol for the 
preparation of an enzyme by this latter route is set forth in U.S. Patent No. 5,583,026 
to O'Donnell, and the disclosure thereof is incorporated herein in its entirety for such 
purpose. 

Likewise, individual subunits may be modified, e.g. as by 
incorporation therein of single residue substitutions to create active sites therein, for 
the purpose of imparting new or enhanced properties to enzymes containing the 
modified subunits (see, e.g., Tabor, 1995). Likewise, individual subunits prepared in 
accordance with the invention, may be used individually and for example, may be 
substituted for their counterparts in other enzymes, to improve or particularize the 
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properties of the resultant modified enzyme. Such modifications are within the skill 
of the art and are considered to be included within the scope of the present invention. 

Accordingly, the invention includes the various subunits that may 
comprise the enzymes, and accordingly extends to the genes and corresponding 
5 proteins that may be encoded thereby, such as the a (as well as PolC), p, y, s, x, 8 and 
8' subunits, respectively. More particularly, in Thermus thermophilus the a subunit 
corresponds to dnaE, the B subunit corresponds to dnaN, the s subunit corresponds to 
dnaQ, and the y and z subunits correspond to dnaX, the 8 subunit corresponds to 
holA, and the 8' subunit corresponds to holB. In Aquifex aeolicus and Thermotoga 
1 0 maritima, the a subunit corresponds to dnaE, the P subunit corresponds to dnaN, the s 
subunit corresponds to dnaQ, the x subunit corresponds to dnaX, the 8 subunit 
corresponds to holA, and the 8' subunit corresponds to holB. In Bacillus 
stearothermophilus, the PolC which has both a and s activities corresponds to polC, 
the p subunit corresponds to dnaN, the b subunit corresponds to dnaQ, the x subunit 
15 corresponds to dnaX, the 8 subunit corresponds to holA, and the 8' subunit 
corresponds to holB. 

Accordingly, the Polymerase Ill-type enzyme of the present invention 
comprises at least one gene encoding a subunit thereof, which gene is selected from 
the group consisting of dnaX, dnaQ, dnaE, dnaN, holA, holB, and combinations 
20 thereof. More particularly, the invention extends to the nucleic acid molecule 
encoding them and their encoded subunits. 

In the T.th. Pol III enzyme, this includes the following nucleotide 
sequences: dnaX (SEQ. ID. No. 3), dnaE (SEQ. ID. No. 86), dnaQ (SEQ. ID. No. 94), 
dnaN (SEQ. ID. No. 106), holA (SEQ. ID. No. 157), and holB (SEQ. ID. No. 155). 
25 In the A.ae. Pol III enzyme, this includes the following nucleotide 

sequences: dnaX (SEQ. ID. No. 119), dnaE (SEQ. ID. No. 1 17), dnaQ (SEQ. ID. No. 
127), dnaN (SEQ. ID. No. 121), holA (SEQ. ID. No. 123), and holB (SEQ. ID. No. 
125). 

In the T.ma. Pol III enzyme, this includes the following nucleotide 
30 sequences: dnaX (SEQ. ID. No. 141), dnaE (SEQ. ID. No. 137), dnaQ (SEQ. ID. No. 
139), dnaN (SEQ. ID. No. 143), holA (SEQ. ID. No. 145), and holB (SEQ. ID. No. 
147). 
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In the B.st. Pol III enzyme, this includes the following nucleotide 
sequences: dnaX(SEQ. ID. No. 181), dnaN (SEQ. ID. No. 173), holA (SEQ. ID. No. 
177), holB (SEQ. ID. No. 179), and polC (SEQ. ID. Nos. 183). 

In each of the Pol III type enzymes of the present invention, not only 
are each of the above-identified coding sequences contemplated, but also conserved 
variants, active fragments and analogs thereof. 

A particular T.th. Polymerase Ill-type enzyme in accordance with the 
invention may include at least one of the following sub-units: a y subunit having an 
amino acid sequence corresponding to SEQ. ID. Nos. 4 and 5; a x subunit having an 
amino acid sequence corresponding to SEQ. ID. No. 2; a s subunit having an amino 
acid sequence corresponding to SEQ. ID. No. 95; a a subunit including an amino acid 
sequence corresponding SEQ. ID. No. 87; a B subunit having an amino acid sequence 
corresponding to SEQ. ID. No. 107; a 8 subunit having an amino acid sequence 
corresponding to SEQ. ID. No. 158; a 5' subunit having an amino acid sequence 
corresponding to SEQ. ID. No. 156; as well as variants, including allelic variants, 
muteins, analogs and fragments of any of the subunits, and compatible combinations 
thereof, capable of functioning in DNA amplification and sequencing. 

A particular A.ae. Polymerase Ill-type enzyme in accordance with the 
invention may include at least one of the following sub-units: a x subunit having an 
amino acid sequence corresponding to SEQ. ID. No. 120; a s subunit having an amino 
acid sequence corresponding to SEQ. ID. No. 128; a a subunit including an amino 
acid sequence corresponding to SEQ. ID. No. 1 18; a B subunit having an amino acid 
sequence corresponding to SEQ. ID. No. 122; a 8 subunit having an amino acid 
sequence corresponding to SEQ. ID. No. 124; a 5' subunit having an amino acid 
sequence corresponding to SEQ. ID. No. 126; as well as variants, including allelic 
variants, muteins, analogs and fragments of any of the subunits, and compatible 
combinations thereof, capable of functioning in DNA amplification and sequencing. 

A particular T.ma. Polymerase Ill-type enzyme in accordance with the 
invention may include at least one of the following sub-units: a x subunit having an 
amino acid sequence corresponding to SEQ. ID. No. 142; a s subunit having an amino 
acid sequence corresponding to SEQ. ID. No. 140; a a subunit including an amino 
acid sequence corresponding to SEQ. ID. No. 138; a B subunit having an amino acid 
sequence corresponding to SEQ. ID. No. 144; a 8 subunit having an amino acid 
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sequence corresponding to SEQ. ID. No. 146; a 8' subunit having an amino acid 
sequence corresponding to SEQ. ID. No. 148; as well as variants, including allelic 
variants, muteins, analogs and fragments of any of the subunits, and compatible 
combinations thereof, capable of functioning in DNA amplification and sequencing. 

5 A particular B.st. Polymerase Ill-type enzyme in accordance with the 

invention may include at least one of the following subunits: a x subunit having a 
partial amino acid sequence corresponding to SEQ. ID. No. 182; a p subunit having 
an amino acid sequence corresponding to SEQ ID. No. 174; a 5 subunit having an 
amino acid sequence corresponding to SEQ. ID. No. 178; a 5' subunit having an 

1 0 amino acid sequence corresponding to SEQ. ID. No. 1 80; a PolC subunit having an 
amino acid sequence corresponding to SEQ. ID. Nos. 184; as well as variants, 
including allelic variants, muteins, analogs and fragments of any of the subunits, and 
compatible combinations thereof, capable of functioning in DNA amplification and 
sequencing. 

15 The invention also includes and extends to the use and application of 

the enzyme and/or one or more of its components for DNA molecule amplification 
and sequencing by the methods set forth hereinabove, and in greater detail later on 
herein. 

One of the subunits of the invention is the T.th. yAc subunit encoded by 
20 a dnaX gene, which frameshifts as much as -2 with high efficiency, and that, upon 
frameshifting, leads to the addition of more than one extra amino acid residue to the 
C-terminus (to form the y subunit). Further, the invention likewise extends to a dnaX 
gene derived from a thermophile such as T.th, that possesses the frameshift defined 
herein and that codes for expression of the y and t subunits of DNA Polymerase III. 
25 The present invention provides methods for amplifying or sequencing 

a nucleic acid molecule comprising contacting the nucleic acid molecule with a 
composition comprising a DNA polymerase III enzyme (DNA pol III) complex (for 
sequencing, preferably a DNA pol III complex that is substantially reduced in 3'-5' 
exonuclease activity). DNA pol III complexes used in the methods of the present 
30 invention are thermostable. 

The invention also provides DNA molecules amplified by the present 
methods, methods of preparing a recombinant vector comprising inserting a DNA 
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molecule amplified by the present methods into a vector, which is preferably an 
expression vector, and recombinant vectors prepared by these methods. 

The invention also provides methods of preparing a recombinant host 
cell comprising inserting a DNA molecule amplified by the present methods into a 
host cell, which preferably a bacterial cell, most preferably an Escherichia coli cell; a 
yeast cell; or an animal cell, most preferably an insect cell, a nematode cell or a 
mammalian cell. The invention also provides and recombinant host cells prepared by 
these methods. 

In additional preferred embodiments, the present invention provides 
kits for amplifying or sequencing a nucleic acid molecule. DNA amplification kits 
according to the invention comprise a carrier means having in close confinement 
therein two or more container means, wherein a first container means contains a DNA 
polymerase III enzyme complex and a second container means contains a 
deoxy nucleoside triphosphate. DNA sequencing kits according to the present 
invention comprise a multi-protein Pol Ill-type enzyme complex and a second 
container means contains a dideoxynucleoside triphosphate. The DNA pol III 
contained in the container means of such kits is preferably substantially reduced in 5'- 
3' exonuclease activity, may be thermostable, and may be isolated from the 
thermophilic cellular sources described above. 

DNA pol Ill-type enzyme complexes for use in the present invention 
may be isolated from any organism that produced the DNA pol Ill-type enzyme 
complexes naturally or recombinantly. Such enzyme complexes may be 
thermostable, isolated from a variety of thermophilic organisms. 

The thermostable DNA polymerase Ill-type enzymes or complexes 
that are an important aspect of this invention, may be isolated from a variety of 
thermophilic bacteria that are available commercially (for example, from American 
Type Culture Collection, Rockville, Maryland). Suitable for use as sources of 
thermostable enzymes are the thermophilic eubacteria Aquifex aeolicus and other 
species of the Aquifex genus; Thermus aquaticus, Thermus thermophilus, Thermus 
flavus, Thermus ruber, Thermus brockianus, and other species of the Thermus genus; 
Bacillus stearothermophilus, Bacillus subtilis, and other species of the Bacillus genus; 
Thermoplasma acidophilus and other species of the Thermoplasma genus; 
Thermotoga neapolitana, Thermotoga maritima and other species of the Thermotoga 
genus; and mutants of each of these species. It will be understood by one of ordinary 
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skill in the art, however, that any thermophilic microorganism might be used as a 
source of thermostable DNA pol Ill-type enzymes and polypeptides for use in the 
methods of the present invention. Bacterial cells may be grown according to standard 
microbiological techniques, using culture media and incubation conditions suitable for 

5 growing active cultures of the particular thermophilic species that are well-known to 
one of ordinary skill in the art (see, e.g., Brock et al., 1969; Oshima et al, 1974). 
Thermostable DNA pol III complexes may then be isolated from such thermophilic 
cellular sources as described for thermolabile complexes above. 

Several methods are available for identifying homologous nucleic 

1 0 acids and protein subunits in other thermophilic eubacteria, either those listed above 
or otherwise. These methods include the following: 

(1) The following procedure was used to obtain the genes encoding 
T.th. s (dnaQ), x/y {dnaX), DnaA {dnaA), and (3 (dnaN). Protein sequences encoded 
by genes of non-thermophilic bacteria (i.e., mesophiles) are aligned to identify highly 

15 conserved amino acid sequences. PCR primers at conserved positions are designed 
using the codon usage of the organism of interest to amplify an internal section of the 
gene from genomic DNA extracted from the organism. The PCR product is 
sequenced. New primers are designed near the ends of the sequence to obtain new 
sequence that flanks the ends using circular PCR (also called inversed PCR) on 

20 genomic DNA that has been cut with the appropriate restriction enzyme and ligated 
into circles. These new PCR products are sequenced. The procedure is repeated until 
the entire gene sequence has been obtained. Also, dnaN (encoding p) is located next 
to dnaA in bacteria and, therefore, dnaN can be obtained by cloning DNA flanking the 
dnaA gene by the circular PCR procedure starting within dnaA. Once the gene is 

25 obtained, it is cloned into an expression vector for protein production. 

(2) The following procedure was used to obtain the genes encoding 
T.th a polymerase {dnaE gene). The DNA polymerase III can be purified directly 
from the organism of interest and amino acid sequence of the subunit(s) obtained 
directly. In the case of T.th, T.th. cells were lysed and proteins were fractionated. An 

30 antibody against E. coli a was used to probe column fractions by Western analysis, 
which reacted with T.th. a. The T.th. a was transferred to a membrane, proteolyzed, 
and fragments were sequenced. The sequence was used to design PCR primers for 
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amplification of an internal section of the dnaE gene. Remaining flanking sequences 
are then obtained by circular PCR. 

(3) The following procedure can be used to identify published 
nucleictide sequences which have not yet been identified as to their function. This 
5 method was used to obtain T.th. 8 (hoi A) and 8' (holB), although they could 

presumably also have been obtained via Methods 1 and 2 above. Discovery of T.th. 
dnaE (a), dnaN($) and dnaX(x/y) indicates that thermophiles use a class III type of 
DNA polymerase (a) that utilize a clamp ((3) and must also use a clamp loader since 
they have x/y. Also, the biochemical experiments in the Examples infra show that the 
10 T.th. polymerase functions with the T.th. p clamp. Having demonstrated that a 
thermophile (e.g., T.th.) does indeed utilize a class III type of polymerase with a 
clamp and clamp loader, it can be assumed that they may have 5 and 5' subunits 
needed to form a complex with x/y for functional clamp loading activity (i.e., as shown 
in E. coli, 8 and 8' bind either x or y to form xSS' or y88' complex, both of which are 
1 5 functional clamp loaders). The 8 subunit is not very well conserved, but does give a 
match in the sequence databases fox A.ae., T.ma, and T.th. The T.th. database 
provided limited information on the amino acid sequence of 8 subunit, although one 
can easily obtain the complete sequence of T.th. holA by PCR and circular PCR as 
outlined above in Method 1. The A.ae. and T.ma. databases are complete and, 
20 therefore, the entire holA sequence from these genomes are identified. Neither 

database recognized these sequences as 8 encoded by holA. The 5' subunit (holB) is 
fairly well conserved. Again the incomplete T. th. database provided limited 5' 
sequence, but as with 8, it is a straight forward process for anyone experienced in the 
area to obtain the rest of the holB sequence using PCR and circular PCR as described 
25 in Method 1. Neither the A.ae. nor T.ma. databases recognized holB encoding 8'. 
Nevertheless, holB was identified as encoding 5' by searching the databases with 8' 
sequence. In each case, the Thermatoga maritima and Aquifex aeolicus holB gene 
and 8' sequence were obtained in their entirety. Neither database had previously 
annotated holA or holB encoding 8 and 8'. 
30 As stated above and in accordance with the present invention, once 

nucleic acid molecules have been obtained, they may be amplified according to any of 
the literature-described manual or automated amplification methods. Such methods 
includes, but are not limited to, PCR (U.S. Patent No. 4,683,195 to Mullis et al. and 
U.S. Patent No. 4,683,202 to Mullis), Strand Displacement Amplification (SDA) 
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(U.S. Patent No. 5,455,166 to Walker), and Nucleic Acid Sequence-Based 
Amplification (NASBA) (U.S. Patent No. 5,409,818 to Davey et al.; EP 329,822 to 
Davey et al.). Most preferably, nucleic acid molecules are amplified by the methods 
of the present invention using PCR-based amplification techniques. 

In the initial steps of each of these amplification methods, the nucleic 
acid molecule to be amplified is contacted with a composition comprising a DNA 
polymerase belonging to the evolutionary "family A" class (e.g., Taq DNA pol I or E. 
coli pol I) or the "family "B" class (e.g., Vent and Pfu DNA polymerases - see Ito 
and Braithwaite, 1991). All of these DNA polymerases are present as single subunits 
and are primarily involved in DNA repair. In contrast, the DNA pol Ill-type enzymes 
are multisubunit complexes that mainly function in the replication of the 
chromosome, and the subunit containing the DNA polymerase activity is in the 
"family C" class. 

Thus, in amplifying a nucleic acid molecule according to the methods 
of the present invention, the nucleic acid molecule is contacted with a composition 
comprising a thermostable DNA pol Ill-type enzyme complex. 

Once the nucleic acid molecule to be amplified is contacted with the 
DNA pol Ill-type complex, the amplification reaction may proceed according to 
standard protocols for each of the above-described techniques. Since most of these 
techniques comprise a high-temperature denaturation step, if a thermolabile DNA pol 
Ill-type enzyme complex is used in nucleic acid amplification by any of these 
techniques the enzyme would need to be added at the start of each amplification 
cycle, since it would be heat-inactivated at the denaturation step. However, a 
thermostable DNA pol III -type complex used in these methods need only be added 
once at the start of the amplification (as for Taq DNA polymerase in traditional PCR 
amplifications), as its activity will be unaffected by the high temperature of the 
denaturation step. It should be noted, however, that because DNA pol Ill-type 
enzymes may have a much more rapid rate of nucleotide incorporation than the 
polymerases commonly used in these amplification techniques, the cycle times may 
need to be adjusted to shorter intervals than would be standard. 

In an alternative preferred embodiment, the invention provides 
methods of extending primers for several kilobases, a reaction that is central to 
amplifying large nucleic acid molecules, by a technique commonly referred to as 
"long chain PCR" (Barnes, 1994; Cheng, 1994). 
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In such a method the target primed DNA can contain a single strand 
stretch of DNA to be copied into the double strand form of several or tens of 
kilobases. The reaction is performed in a suitable buffer, preferably Tris, at a pH of 
between 5.5 - 9.5, preferably 7.5. The reaction also contains MgCl 2 in the range 1 
5 mM to 10 mM, preferably 8 mM, and may contain a suitable salt such as NaCl, KC1 
or sodium or potassium acetate. The reaction also contains ATP in the range of 20 
uM to 1 mM, preferably 0.5 mM, that is needed for the clamp loader to assemble the 
clamp onto the primed template, and a sufficient concentration of deoxynucleoside 
triphosphates in the range of 50uM to 0.5 mM, preferably 60 uM for chain extension. 
1 0 The reaction contains a sliding clamp, such as the B subunit, in the range of 20ng to 
200 ng, preferably 100 ng, for action as a clamp to stimulate the DNA polymerase. 
The chain extension reaction contains a DNA polymerase and a clamp loader, that 
could be added either separately or as a single Pol IIP -like particle, preferably as a 
Pol IIP like particle that contains the DNA polymerase and clamp loading activities. 
15 The Pol Ill-type enzyme is added preferably at a concentrations of about 0.0002-200 
units per milliliter, about 0.002-100 units per milliliter, about 0.2-50 units per 
milliliter, and most preferably about 2-50 units per milliliter. The reaction is 
incubated at elevated temperature, preferably 60°C or more, and could include other 
proteins to enhance activity such as a single strand DNA binding protein. 
20 In another preferred embodiment, the invention provides methods of 

extending primers on linear templates in the absence of the clamp loader. In this 
reaction, the primers are annealled to the linear DNA, preferably at the ends such as in 
standard PCR applications. The reaction is performed in a suitable buffer, preferably 
Tris, at a pH of between 5.5 - 9.5, preferably 7.5. The reaction also contains MgCl 2 in 
25 the range of 1 mM to 10 mM, preferably 8 mM, and may contain a suitable salt such 
as NaCl, KC1 or sodium or potassium acetate. The reaction also contains a sufficient 
concentration of deoxynucleoside triphosphates in the range of 50uM to 0.5 mM, 
preferably 60 uM for chain extension. The reaction contains a sliding clamp, such as 
the B subunit, in the range of 20ng to 20 jag, preferably about 2 |xg, for ability to slide 
30 on the end of the DNA and associate with the polymerase for action as a clamp to 

stimulate the DNA polymerase. The chain extension reaction also contains a Pol Ill- 
type polymerase subunit such as a, core, or a Pol IIP -like particle. The Pol Ill-type 
enzyme is added preferably at a concentrations of about 0.0002-200 units per 
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milliliter, about 0.002-100 units per milliliter, about 0.2-50 units per milliliter, and 
most preferably about 2-50 units per milliliter. The reaction is incubated at elevated 
temperature, preferably 60°C or more, and could include other proteins to enhance 
activity such as a single strand DNA binding protein. 

The methods of the present invention thus will provide high-fidelity 
amplified copies of a nucleic acid molecule in a more rapid fashion than traditional 
amplification methods using the repair-type enzymes. 

These amplified nucleic acid molecules may then be manipulated 
according to standard recombinant DNA techniques. For example, a nucleic acid 
molecule amplified according to the present methods may be inserted into a vector, 
which is preferably an expression vector, to produce a recombinant vector comprising 
the amplified nucleic acid molecule. This vector may then be inserted into a host cell, 
where it may, for example, direct the host celLto produce a recombinant polypeptide 
encoded by the amplified nucleic acid molecule. Methods for inserting nucleic acid 
molecules into vectors, and inserting these vectors into host cells, are well-known to 
one of ordinary skill in the art (see, e.g., Maniatis, 1992). 

Alternatively, the amplified nucleic acid molecules may be directly 
inserted into a host cell, where it may be incorporated into the host cell genome or 
may exist as an extrachromosomal nucleic acid molecule, thereby producing a 
recombinant host cell. Methods for introduction of a nucleic acid molecule into a host 
cell, including calcium phosphate transfection, DEAE-dextran mediated transfection, 
cationic lipid-mediated transfection, electroporation, transduction, infection or other 
methods, are described in many standard laboratory manuals (see, e.g., Davis, 1986). 

For each of the above techniques wherein an amplified nucleic acid 
molecule is introduced into a host cell via a vector or via direct introduction, preferred 
host cells include but are not limited to a bacterial cell, a yeast cell, or an animal cell. 
Bacterial host cells preferred in the present invention are E. coli, Bacillus spp., 
Streptomyces spp., Erwinia spp., Klebsiella spp. and Salmonella typhimurium. 
Preferred as a host cell is E. coli, and particularly preferred are E. coli strains DH10B 
and Stbl2, which are available commercially (Life Technologies, Inc. Gaithersburg, 
Maryland). Preferred animal host cells are insect cells, nematode cells and 
mammalian cells. Insect host cells preferred in the present invention are Drosophila 
spp. cells, Spodoptera Sf9 and Sf21 cells, and Trichoplusa High-Five cells, each of 
which is available commercially (e.g., from Invitrogen; San Diego, California). 
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Preferred nematode host cells are those derived from C. elegans, and preferred 
mammalian host cells are those derived from rodents, particularly rats, mice or 
hamsters, and primates, particularly monkeys and humans. Particularly preferred as 
mammalian host cells are CHO cells, COS cells and VERO cells. 

By the present invention, nucleic acid molecules may be sequenced 
according to any of the literature-described manual or automated sequencing methods. 
Such methods include, but are not limited to, dideoxy sequencing methods such as 
"Sanger sequencing" (Sanger and Coulson, 1975; Sanger et al., 1977; U.S. Patent No. 
4,962,022 to Fleming et al.; and U.S. Patent No. 5,498,523 to Tabor et al.), as well as 
more complex PCR-based nucleic acid fingerprinting techniques such as Random 
Amplified Polymorphic DNA (RAPD) analysis (Williams et al., 1990). Arbitrarily 
Primed PCR (AP-PCR) (Welsh and McClelland, 1990), DNA Amplification 
Fingerprinting (DAF) (Caetano-Anolles, 1991), microsatellite PCR or Directed 
Amplification of Minisatellite-region DNA (DAMD) (Heath et al., 1993), and 
Amplification Fragment Length Polymorphism (AFLP) analysis (EP 534,858 to Vos 
et al.; Vos et al., 1995; Lin and Kuo, 1995). 

As described above for amplification methods, the nucleic acid 
molecule to be sequenced by these methods is typically contacted with a composition 
comprising a type A or type B DNA polymerase. By contrast, in sequencing a nucleic 
acid molecule according to the methods of the present invention, the nucleic acid 
molecule is contacted with a composition comprising a thermostable DNA pol Ill- 
type enzyme complex instead of necessarily using a DNA polymerase of the family A 
or B classes. As for amplification methods, the DNA pol Ill-type complexes used in 
the nucleic acid sequencing methods of the present invention are preferably 
substantially reduced in 3'-5' exonuclease activity; most preferable for use in the 
present methods is a DNA polymerase Ill-type complex which lacks the s subunit. 
DNA pol Ill-type complexes used for nucleic acid sequencing according to the 
present methods are used at the same preferred concentration ranges described above 
for long chain extension of primers. 

Once the nucleic acid molecule to be sequenced is contacted with the 
DNA pol III complex, the sequencing reactions may proceed according to the 
protocols disclosed in the above-referenced techniques. 

As discussed above, the invention extends to kits for use in nucleic 
acid amplification or sequencing utilizing DNA polymerase Ill-type enzymes 
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according to the present methods. A DNA amplification kit according to the present 
invention may comprise a carrier means, such as vials, tubes, bottles and the like. A 
first such container means may contain a DNA polymerase Ill-type enzyme complex, 
and a second such container means may contain a deoxynucleoside triphosphate. The 
amplification kit encompassed by this aspect of the present invention may further 
comprise additional reagents and compounds necessary for carrying out standard 
nucleic amplification protocols (See U.S. Patent No. 4,683,195 to Mullis et al. and 
U.S. Patent No. 4,683,202 to Mullis, which are directed to methods of DNA 
amplification by PCR). 

Similarly, a DNA sequencing kit according to the present invention 
comprises a carrier means having in close confinement therein two or more container 
means, such as vials, tubes, bottles and the like. A first such container means may 
contain a DNA polymerase Ill-type enzyme complex, and a second such container 
means may contain a dideoxynucleoside triphosphate. The sequencing kit may 
further comprise additional reagents and compounds necessary for carrying out 
standard nucleic sequencing protocols, such as pyrophosphatase, agarose or 
polyacrylamide media for formulating sequencing gels, and other components 
necessary for detection of sequenced nucleic acids (See U.S. Patent No. 4,962,020 to 
Fleming et al. and U.S. Patent No. 5,498,523 to Tabor et al., which are directed to 
methods of DNA sequencing). 

The DNA polymerase Ill-type complex contained in the first container 
means of the amplification and sequencing kits provided by the invention is 
preferably a thermostable DNA polymerase Ill-type enzyme complex and more 
preferably a DNA polymerase Ill-type enzyme complex that is reduced in 3-5' 
exonuclease activity. Naturally, the foregoing methods and kits are presented as 
illustrative and not restrictive of the use and application of the enzymes of the 
invention for DNA molecule amplification and sequencing. Likewise, the 
applications of specific embodiments of the enzymes, including conserved variants 
and active fragments thereof are considered to be disclosed and included within the 
scope of the invention. 

As discussed earlier, individual subunits could be modified to 
customize enzyme construction and corresponding use and activity. For example, the 
region of a that interacts with fi could be subcloned onto another DNA polymerase, 
thereby causing B to enhance the activity of the recombinant polymerase. 
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Alternatively, the B clamp could be modified to function with another protein or 
enzyme thereby enhancing its activity or acting to localize its action to a particular 
targeted DNA. Finally, the polymerase active site could be modified to enhance its 
action, for example changing Tyrosine enabling more equal site stoppage with the 
four ddNTPs (Tabor et al., 1995). This represents a particular non-limiting 
illustration of the scope and practice of the present invention with reference to the 
utility of individual subunits hereof. 

Accordingly and as stated above, the present invention also relates to a 
recombinant DNA molecule or cloned gene, or a degenerate variant thereof, which 
encodes any one or all of the subunits of the DNA Polymerase Ill-type enzymes of the 
present invention, or active fragments thereof. In the instance of the t subunit, a 
predicted molecular weight of about 58 kD and an amino acid sequence set forth in 
SEQ ID Nos. 4 or 5 is comprehended; preferably a nucleic acid molecule, in particular 
a recombinant DNA molecule or cloned gene, encoding the 58 kD subunit of the 
Polymerase III of the invention, that has a nucleotide sequence or is complementary to 
a DNA sequence shown in FIGURES 4 A and 4B (SEQ ID No. 1), and the coding 
region for dnaX set forth in FIGURE 4C (SEQ ID No. 3). The y subunit is smaller, 
and is approximately 50 kD, depending upon the extent of the frameshift that occurs. 
More particularly, and as set forth in FIGURE 4E (SEQ ID No. 4), the y subunit 
defined by a -1 frameshift possesses a molecular weight of 50.8 kD, while the y 
subunit defined by a -2 frameshift, set forth in FIGURE 4F (SEQ ID No. 5), possesses 
a molecular weight of 49.8 kD. 

As discussed above, the invention also extends to the genes including 
hoi A, holB, dnaX, dnaQ, dnaE, and dnaN from thermophilic eubacteria (i.e., T.th. and 
A.ae.) that have been isolated and/or purified, to corresponding vectors for the genes, 
and particularly, to the vectors disclosed herein, and to host cells including such 
vectors. In this connection, probes have been prepared which hybridize to the DNA 
polymerase Ill-type enzymes of the present invention, and which are selected from 
the various oligonucleotide probes or primers set forth in the present application. 
These include, without limitation, the oligonucleotide defined in SEQ ID No. 6 the 
oligonucleotide defined in SEQ ID No. 8 the oligonucleotide defined in SEQ ID No. 
10 the oligonucleotide defined in SEQ ID No. 1 1 the oligonucleotide defined in SEQ 
ID No. 12 the oligonucleotide defined in SEQ ID No. 13 the oligonucleotide defined 
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in SEQ ID No. 14 the oligonucleotide defined in SEQ ID No. 15, and the 
oligonucleotide defined in SEQ ID No. 16. 

The methods of the invention include a method for producing a 
recombinant thermostable DNA polymerase Ill-type enzyme from a thermophilic 
5 bacterium, such as T. th. , A. ae. , Th. ma., or B. st. which comprises culturing a host cell 
transformed with a vector of the invention under conditions suitable for the expression 
of the present DNA polymerase III. Another method includes a method for isolating a 
target DNA fragment consisting essentially of a DNA coding for a thermostable DNA 
polymerase Ill-type enzyme from a thermophilic bacterium comprising the steps of: 
10 (a) forming a genomic library from the bacterium; 

(b) transforming or transfecting an appropriate host cell with the 
library of step (a); 

(c) contacting DNA from the transformed or transfected host cell with 
a DNA probe which hybridizes to a DNA fragment selected from the group consisting 

1 5 of the DNA fragments defined in SEQ ID No. 6 and the DNA fragments defined in 

SEQ ID No. 8 or the oligonucleotides set forth above; wherein hybridization is 

conducted under the following conditions: 

i) hybridization: 1% crystalline BSA (fraction V) (Sigma), 

1 mM EDTA, 0.5 M NaHP04 (pH 7.2), 7% SDS at 65°C for 12 hours and; 
20 ii) wash: 5 x 20 minutes with wash buffer consisting of 

0.5% BSA, fraction V), ImM Na2EDTA, 40 mM NaHP04 (pH 7.2), and 5% SDS; 

(d) assaying the transformed or transfected cell of step (c) which 
hybridizes to the DNA probe for DNA polymerase Ill-type activity; and 

(e) isolating a target DNA fragment which codes for the thermostable 
25 DNA polymerase Ill-type enzyme. 

Also, antibodies including both polyclonal and monoclonal antibodies, 
and the DNA Polymerase Ill-like enzyme complex and/or their y and x subunits, a 
subunit(s), 5 subunit, 8' subunit, p subunit, b subunit may be used in the preparation 
of the enzymes of the present invention as well as other enzymes of similar 
30 thermophilic origin. For example, the DNA Polymerase Ill-type complex or its 
subunits may be used to produce both polyclonal and monoclonal antibodies to 
themselves in a variety of cellular media, by known techniques such as the hybridoma 
technique utilizing, for example, fused mouse spleen lymphocytes and myeloma cells. 
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The general methodology for making monoclonal antibodies by 
hybridomas is well known. Immortal, antibody-producing cell lines can also be 
created by techniques other than fusion, such as direct transformation of B 
lymphocytes with oncogenic DNA, or transfection with Epstein-Barr virus. See, e.g., 

5 Schreier et al., 1980; Hammerling et al., 1981; Kennett et al, 1980; see also U.S. 
Patent No. 4,341,761 to Ganfield et al.; U.S. Patent No. 4,399,121 to Albarella et al.; 
U.S. Patent No. 4,427,783 to Newman et al.; U.S. Patent No. 4,444,887 to Hoffman; 
U.S. Patent No. 4,451,570 to Royston et al.; U.S. Patent No. 4,466,917 to 
Nussenzweig et al.; U.S. Patent No. 4,472,500 to Milstein et al.; U.S. Patent No. 

10 4,491,632 to Wands et al.; and U.S. Patent No. 4,493,890 to Morris. 

Methods for producing polyclonal anti-polypeptide antibodies are 
well-known in the art. See U.S. Patent No. 4,493,795 to Nestor et al. A monoclonal 
antibody, typically containing Fab and/or F(ab')2 portions of useful antibody 
molecules, can be prepared using the hybridoma technology described in Antibodies - 

15 A Laboratory Manual, Harlow and Lane, eds., Cold Spring Harbor Laboratory, New 
York (1988), which is incorporated herein by reference. Briefly, to form the 
hybridoma from which the monoclonal antibody composition is produced, a myeloma 
or other self-perpetuating cell line is fused with lymphocytes obtained from the spleen 
of a mammal hyperimmunized with an elastin-binding portion thereof. 

20 A monoclonal antibody useful in practicing the present invention can 

be produced by initiating a monoclonal hybridoma culture comprising a nutrient 
medium containing a hybridoma that secretes antibody molecules of the appropriate 
antigen specificity. The culture is maintained under conditions and for a time period 
sufficient for the hybridoma to secrete the antibody molecules into the medium. The 

25 antibody-containing medium is then collected. The antibody molecules can then be 
further isolated by well-known techniques. 

Media useful for the preparation of these compositions are both well- 
known in the art and commercially available and include synthetic culture media, 
inbred mice and the like. An exemplary synthetic medium is Dulbecco's minimal 

30 essential medium (DMEM) (Dulbecco et al., 1959) supplemented with 4.5 gm/1 

glucose, 20 mm glutamine, and 20% fetal calf serum. An exemplary inbred mouse 
strain is the Balb/c. 

Another feature of this invention is the expression of the DNA 
sequences disclosed herein. As is well known in the art, DNA sequences may be 
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expressed by operatively linking them to an expression control sequence in an 
appropriate expression vector and employing that expression vector to transform an 
appropriate unicellular host. 

Such operative linking of a DNA sequence of this invention to an 

5 expression control sequence, of course, includes, if not already part of the DNA 
sequence, the provision of an initiation codon, ATG, in the correct reading frame 
upstream of the DNA sequence. 

A wide variety of host/expression vector combinations may be 
employed in expressing the DNA sequences of this invention. Useful expression 

10 vectors, for example, may consist of segments of chromosomal, non-chromosomal 
and synthetic DNA sequences. Suitable vectors include derivatives of S V40 and 
known bacterial plasmids, e.g., E. coli plasmids col El, pCRl, pBR322, pMB9 and 
their derivatives, plasmids such as RP4; phage DNAS, e.g., the numerous derivatives 
of phage X, e.g., NM989, and other phage DNA, e.g., M13 and filamentous single 

1 5 stranded phage DNA; yeast plasmids such as the 2\i plasmid or derivatives thereof; 

vectors useful in eukaryotic cells, such as vectors useful in insect or mammalian cells; 
vectors derived from combinations of plasmids and phage DNAs, such as plasmids 
that have been modified to employ phage DNA or other expression control sequences; 
and the like. 

20 Any of a wide variety of expression control sequences ~ sequences 

that control the expression of a DNA sequence operatively linked to it ~ may be used 
in these vectors to express the DNA sequences of this invention. Such useful 
expression control sequences include, for example, the early or late promoters of 
SV40, CMV, vaccinia, polyoma or adenovirus, the lac system, the trp system, the 

25 TAC system, the TRC system, the LTR system, the major operator and promoter 
regions of phage X, the control regions of fd coat protein, the promoter for 
3 -phosphogly cerate kinase or other glycolytic enzymes, the promoters of acid 
phosphatase (e.g., Pho5), the promoters of the yeast a-mating factors, and other 
sequences known to control the expression of genes of prokaryotic or eukaryotic cells 

30 or their viruses, and various combinations thereof. 

A wide variety of unicellular host cells are also useful in expressing the 
DNA sequences of this invention. These hosts may include well known eukaryotic 
and prokaryotic hosts, such as strains of E. coli, Pseudomonas, Bacillus, 
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Streptomyces, fungi such as yeasts, and animal cells, such as CHO, Rl.l, B-W and 
L-M cells, African Green Monkey kidney cells (e.g., COS 1, COS 7, BSC1, BSC40, 
and BMT10), insect cells (e.g., Sf9), and human cells and plant cells in tissue culture. 

It will be understood that not all vectors, expression control sequences 
5 and hosts will function equally well to express the DNA sequences of this invention. 
Neither will all hosts function equally well with the same expression system. 
However, one skilled in the art will be able to select the proper vectors, expression 
control sequences, and hosts without undue experimentation to accomplish the desired 
expression without departing from the scope of this invention. For example, in 
1 0 selecting a vector, the host must be considered because the vector must function in it. 
The vector's copy number, the ability to control that copy number, and the expression 
of any other proteins encoded by the vector, such as antibiotic markers, will also be 
considered. 

In selecting an expression control sequence, a variety of factors will 

1 5 normally be considered. These include, for example, the relative strength of the 

system, its controllability, and its compatibility with the particular DNA sequence or 
gene to be expressed, particularly with regard to potential secondary structures. 
Suitable unicellular hosts will be selected by consideration of, e.g., their compatibility 
with the chosen vector, their secretion characteristics, their ability to fold proteins 

20 correctly, and their fermentation requirements, as well as the toxicity to the host of the 
product encoded by the DNA sequences to be expressed, and the ease of purification 
of the expression products. 

Considering these and other factors a person skilled in the art will be 
able to construct a variety of vector/expression control sequence/host combinations 

25 that will express the DNA sequences of this invention on fermentation or in large 
scale animal culture. 

It is further intended that analogs may be prepared from nucleotide 
sequences of the protein complex/subunit derived within the scope of the present 
invention. Analogs, such as fragments, may be produced, for example, by pepsin 

30 digestion of bacterial material. Other analogs, such as muteins, can be produced by 
standard site-directed mutagenesis of dnaX, dnaE, dnaQ, dnaN, hoi A, or holB coding 
sequences. Especially useful may be a mutation in dnaE that provides the polymerase 
with the ability to incorporate all four ddNTPs with equal efficiency thereby 
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producing an even binding pattern in sequencing gels, as discussed above and with 
reference to Tabor et al., 1995. 

As mentioned above, a DNA sequence corresponding to dnaX, dnaQ, 
holA,holB, dnaE, or dnaN, or encoding the sub units of the DNA Polymerase III of the 
5 invention can be prepared synthetically rather than cloned. The DNA sequence can 
be designed with the appropriate codons for the amino acid sequence of the subunit(s) 
of interest. In general, one will select preferred codons for the intended host if the 
sequence will be used for expression. The complete sequence is assembled from 
overlapping oligonucleotides prepared by standard methods and assembled into a 
10 complete coding sequence (Edge, 1981; Nambair et al., 1984; Jay et al., 1984). 

Synthetic DNA sequences allow convenient construction of genes 
which will express DNA Polymerase III analogs or "muteins". Alternatively, DNA 
encoding muteins can be made by site-directed mutagenesis of native dnaX, dnaQ, 
holA,holB, dnaE or dnaN genes or their corresponding cDNAs, and muteins can be 
1 5 made directly using conventional polypeptide synthesis. 

A general method for site-specific incorporation of unnatural amino 
acids into proteins is described in Noren et al., 1989. This method may be used to 
create analogs with unnatural amino acids. 

20 GENERAL DESCRIPTION OF THE INVENTION 

As discussed above, the present invention has as one of its 
characterizing features, that a Polymerase Ill-type enzyme as defined hereinabove, 
has been discovered in a thermophile, that has the structure and function of a 

25 chromosomal replicase. This structure and function confers significant benefit when 
the enzyme is employed in procedures such as PCR where speed and accuracy of 
DNA reconstruction is crucial. 

Chromosomal replicases are composed of several subunits in all 
organisms (Kornberg and Baker, 1992). In keeping with the need to replicate long 

30 chromosomes, replicases are rapid and highly processive multiprotein machines. All 
cellular replicases examined to date derive their processivity from one subunit that is 
shaped like a ring and completely encircles DNA (Kuriyan and O'Donnell, 1993; 
Kelman and O'Donnell, 1994). This "sliding clamp" subunit acts as a mobile tether 
for the polymerase machine (Stukenberg et al., 1991). The sliding clamp does not 
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assemble onto the DNA by itself, but requires a complex of several proteins, called a 
"clamp loader" which couples ATP hydrolysis to the assembly of sliding clamps onto 
DNA (O'Donnell et al., 1992). Hence, Pol Ill-type cellular replicases are comprised 
of three components: a clamp, a clamp loader, and the DNA polymerase. 
5 An overall goal is to identify and isolate all of the genes encoding the 

replicase subunits from a thermophile for expression and purification in large 
quantity. Following this, the replication apparatus can be reassembled from 
individual subunit components for use in kits, PCR, sequencing and diagnostic 
applications (Onrust et al., 1995). 

10 As a beginning to identify and characterize the replicase of a 

thermophile, we started by looking for a homologue to the prokaryotic dnaX gene 
which encode subunits (y and x) of the replicase. The dnaXgene has another 
homologue, holB, which encodes yet another subunit (5') of the replicase. The amino 
acid sequence of 5' (encoded by holA) and x/y subunits (encoded by dnaX) are 

1 5 particularly highly conserved in evolution from prokaryotes to eukaryotes (Chen et 
al., 1992; O'Donnell et al., 1993; Onrust et al., 1993; Carter et al., 1993; Cullman et 
al., 1995). 

One organism chosen for study and exposition herein is the exemplary 
extreme thermophile Thermus thermophilics (T.th.). It is understood that other 

20 members of the class such as the eubacterium Thermatoga are expected to be 

analogous in both structure and function. Thus, the investigation of T.th. proceeded 
and initially, a T.th. homologue of dnaXwas identified. The gene encodes a full 
length protein of 529 amino acids. The amino terminal third of the sequence shares 
over 50% homology to dnaX genes as divergent as E. coli (gram negative) and B. 

25 subtilis (gram positive). The T.th. dnaX gene contains a DNA sequence that provides 
a translational frameshift signal for production of two proteins from the same gene. 
Such frameshifting has been documented only in the case of E. coli (Tsuchihashi and 
Kornberg, 1990; Flower and McHenry, 1990; Blinkowa and Walker, 1990). No 
frameshifting has been documented to occur in the dnaX homologues (RFC subunit 

30 genes) of yeast and humans (Eukaryotic kingdom). 

The presence of a dnaX gene that produces two subunits implies that 
T.th. has a clamp loader (y) and may be organized by t into a PolIIP-type replicase 
like the replicative DNA polymerase of Escherichia coli, DNA polymerase III 
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holoenzyme. The E. coli DNA polymerase III holoenzyme contains 10 different 
subumts, some in copies of two or more for a total composition of 1 8 polypeptide 
chains (Kornberg and Baker, 1992; Onrust et al., 1995). The holoenzyme is 
composed of three major activities: the 3-subunit DNA polymerase core (ccsG), the P 
5 subunit DNA sliding clamp, and the 5-subunit y complex clamp loader (ySS'x^)- This 
3 component strategy generalizes to eukaryotes which utilize a clamp (PCNA) and a 
5-subunit RFC clamp loader (RFC) which provide processivity to DNA polymerase § 
(reviewed in Kelman and O'Donnell, 1994). 

In E. coli, the polymerase and clamp loader components are organized 

10 into one PolIII* particle by the x subunit, that acts as a "glue" protein (Onrust et al., 
1 995). One dimer of x holds together two core polymerases in the particle which are 
utilized for the coordinated and simultaneous replication of both strands of duplex 
DNA (McHenry, 1982; Maki et al., 1988; Yuzhakov et al., 1996). The "glue" protein 
x subunit also binds one clamp loader (called y complex) thereby acting as a scaffold 

15 for a large superstructure assembly called DNA polymerase IIP. The gene encoding 
x, called dnaX, also encodes the y subunit of DNA polymerase III. The (3 subunit then 
associates with Pol III* to form the DNA polymerase III holoenzyme. The y subunit 
is approximately 2/3 the length of x. y shares the N-terminus of x, but is truncated by 
a translational frameshifting mechanism that, after the shift, encounters a stop codon 

20 within two amino acids (Tsuchihashi and Kornberg, 1990; Flower and McHenry, 

1990; Blinkowa and Walker, 1990). Hence, y is the N-terminal 453 amino acids of x, 
but contains one unique residue at the C-terminus (the penultimate codon encodes a 
Lys residue which is the same sequence as if the frameshift did not take place). This 
frameshift is highly efficient and occurs approximately 50% of the time. 

25 The sequence of the y and x subunits encoded by the dnaX gene are 

homologous to the clamp loading subunits in all other organisms extending from 
gram negative bacteria through gram positive bacteria, the Archeae Kingdom and the 
Eukaryotic Kingdom from yeast to humans (O'Donnell et al., 1993). All of these 
organisms utilize a three component replicase (DNA polymerase, clamp and clamp 

30 loader) and in these cases the 3 components appear to behave as independent units in 
solution rather than forming a large holoenzyme superstructure. For example, in 
eukaryotes from yeast to humans, the clamp loader is the five subunit RFC, the clamp 
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is PCNA, and the polymerases 5 and s are all stimulated by the PCNA clamp 
assembled onto primed DNA by RFC (reviewed in Kelman and O'Donnell 1994). 

The discovery of a dnaX gene in T.th. provided confidence that 
thermophilic bacteria would contain a three component Pol Ill-type enzyme. Hence, 
5 we proceeded to identify the dnaQ and dnaN genes encoding, respectively, the 
proofreading 3-5' exonuclease, and the B DNA sliding clamp subunits of a Pol 
III -type enzyme. Following this, we purified from extracts of T.th. cells, a Pol Ill- 
type enzyme. This enzyme preparation had the unique property of extending a single 
primer around a long 7.2 kb single strand DNA genome of M13mpl8 bacteriophage. 

1 0 Such a primer extension assay serves as a tool to detect and identify the Pol Ill-type of 
enzyme in cell extracts. The enzyme was confirmed to be a Pol Ill-type enzyme 
based on its reactivity with antibody directed against the E. coli a subunit (the DNA 
polymerase subunit) and antibody directed against E. coli y subunit. Proteins 
corresponding to a, t, y, 8 and 8' were easily visible and lend themselves to 

1 5 identification of the genes through use of peptide micro sequencing followed by 
primer design for PCR amplification. For example, from this DNA pol Ill-type 
preparation, the peptide sequence of the a subunit was obtained, which then allowed 
the dnaE gene encoding the a subunit (DNA polymerase) of the Pol Ill-type enzyme 
to be obtain. 

20 These methods should be widely applicable to other thermophilic 

bacteria. Additional antibody reagents against other Pol Ill-type enzyme components, 
such as RFC subunits, DNA polymerase delta, epsilon or beta, and the PCNA clamp 
from known organisms can be made quite easily as polyclonal or monoclonal 
antibody preparations using as antigen either naturally purified sequence, recombinant 

25 sequence, or synthetic peptide sequence. Examples of known sequences of these Pol 
Ill-type enzymes are to be found in: DNA polymerases (Braithwaite and Ito, 1993), 
RFC clamp loaders (Cullman et al., 1995) and PCNA (Kelman and O'Donnell, 1995). 

The remaining genes of T.th. Pol III needed for efficient extension of 
primed templates, holA and holB, are now identified. The holA coding sequence 

30 (SEQ. ID. No. 157) encodes the 5 subunit (SEQ. ID. No. 158) and the holB coding 
sequence (SEQ. ID. No. 155) encodes the 8' subunit (SEQ. ID. No. 156). The holA 
and holB coding sequences and the 8 and 8' subunits were identified via BLAST 
search (Altschul et al., 1997), and subsequently isolated following circular PCR. 
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These genes will provide the subunit preparations through use of standard 
recombinant techniques and protein purification protocols. The protein subunits can 
then be used to reconstitute the enzyme complexes as they exist in the cell. This type 
of reconstitution of Pol III has been demonstrated using the protein subunits of DNA 
5 polymerase III holoenzyme from E. coli to assemble the entire particle. See, e.g., 
U.S. Patent Nos. 5,583,026 and 5,668,004 to O'Donnell; and Onrust et al., 1995. The 
disclosures of these references are incorporated herein in their entireties. 

Another organism chosen for study and exposition herein is the 
extreme thermophile Aquifex aeolicus. Thus, the present invention also relates to 

1 0 various isolated DNA molecules from Aquifex aeolicus, in particular the DNA 

molecules encoding various replication proteins. These include dnaE, dnaX, dnaN, 
holA, holB, ssb DNA molecules from A. aeolicus. These DNA molecules can be 
inserted into an expression system or used to transform host cells from which isolated 
proteins can be obtained. The isolated proteins encoded by these DNA molecules are 

1 5 also disclosed. 

Unless otherwise indicated below, the Aquifex aeolicus sequences were 
obtained by sequence comparisons using the Thermus thermophilus counterparts as 
query against the genome of Aquifex aeolicus (Deckert et al., 1998). 

The A. aeolicus dnaE gene has a nucleotide coding sequence according 
20 to SEQ. ID. No. 1 1 7 and encodes the a subunit of the of DNA Polymerase III, which 
has an amino acid sequence according to SEQ. ID. No. 118. The A.ae. a subunit has 
approximately 41% aa identity to the T.th. a subunit. 

The A. aeolicus dnaX gene has a nucleotide coding sequence according 
to SEQ. ID. No. 1 1 9 and encodes the x subunit of the of DNA Polymerase III, which 
25 has an amino acid sequence according to SEQ. ID. No. 120. The A.ae. x subunit has 
approximately 5 1 % aa identity to the T. th. % subunit. 

The A. aeolicus dnaN gene has a nucleotide coding sequence according 
to SEQ. ID. No. 121 and encodes the P subunit of DNA Polymerase III, which has an 
amino acid sequence according to SEQ. ID. No. 122. The A.ae. p subunit has 
30 approximately 27% aa identity to the T.th. P subunit. 

The A. aeolicus dnaQ gene has a nucleotide coding sequence 
according to SEQ. ID. No. 127 and encodes the s subunit of the of DNA Polymerase 
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III, which has an amino acid sequence according to SEQ. ID. No. 128. The A.ae. s 
subunit has approximately 26% aa identity to the T.th. e subunit. 

The A aeolicus ssb gene has a nucleotide coding sequence according 
to SEQ. ID. No. 129 and encodes the SSB protein, which has an amino acid sequence 
5 according to SEQ. ID. No. 130. The A.ae SSB protein has approximately 22% aa 
identity to the T.th. SSB protein. 

Further, the coding sequences of A. aeolicus genes encoding the 
helicase (dnaB), helicase loader (dnaC), and primase (dnaG) are also disclosed. The 
A. aeolicus dnaB gene has a nucleotide coding sequence according to SEQ. ID. No. 

10 131 and encodes the DnaB protein, which functions as a helicase and has an amino 
acid sequence according to SEQ. ID. No. 132. The A. aeolicus dnaG gene has a 
nucleotide coding sequence according to SEQ. ID. No. 133 and encodes the DnaG 
protein, which functions as a primase and has an amino acid sequence according to 
SEQ. ID. No. 134. The A. aeolicus dnaC gene has a nucleotide coding sequence 

15 according to SEQ. ID. No. 135 and encodes the DnaC protein, which functions as a 
helicase loader and has an amino acid sequence according to SEQ. ID. No. 136. 

The A. aeolicus holA and holB genes were previously unidentified by 
Deckert et al., 1998. Using Thermus thermophilus 6' subunit amino acid sequence 
and the Thermatoga maritima 8 subunit amino acid sequence (SEQ. ID. No. 146 

20 which itself was obtained using the T.th. 8 subunit amino acid sequence of SEQ. ID. 
No. 158) in separate BLAST searches (Altschul et al., 1997), corresponding 
polypeptide products mAquifex aeolicus were identified. The A. aeolicus holA gene 
has a nucleotide coding sequence according to SEQ. ID. No. 123 and encodes the 8 
subunit of the of DNA Polymerase III, which has an amino acid sequence according 

25 to SEQ. ID. No. 124. The A.ae. 8 subunit has approximately 21% aa identity to the 
T.m. 8 subunit. The A. aeolicus holB gene has a nucleotide coding sequence 
according to SEQ. ID. No. 125 and encodes the 8' subunit of the of DNA Polymerase 
III, which has an amino acid sequence according to SEQ. ID. No. 126. The A.ae. 8' 
subunit has approximately 24% aa identity to the T.th. 8' subunit. 

30 This invention also clones at least the coding regions of a set of A. 

aeolicus genes which encode proteins that assemble into an A. aeolicus DNA 
polymerase III replication enzyme. These genes (dnaE, dnaN, dnaX, dnaQ, holA, 
holB, ssb) were cloned into expression vectors, the proteins were expressed in E. coli, 
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and the corresponding protein subunits were purified (alpha, beta, tau, delta, delta 
prime, SSB). This invention identifies the major protein-protein contacts among these 
subunits, shows how these proteins can be assembled into higher order multiprotein 
complexes, and how to form a rapid and processive DNA polymerase III holoenzyme. 
5 In contrast to the E. coli and T. thermophilus dnaX genes which encode 

both x and y subunits, the A. aeolicus dnaX gene produces only the full length t 
subunit when expressed in E. coli. The A. aeolicus x is intermediate in length 
between the y and x subunits of E. coli DNA polymerase III holoenzyme. The E. coli 
x binds a, the y subunit does not bind a. Due to the intermediate size of A. aeolicus x, 
10 it was not known whether the A. aeolicus x would bind the a subunit. This invention 
shows that indeed, the A. aeolicus x binds to a, as well as 8 and 8', thereby forming an 
A. aeolicus ax85' complex. Until the identification of the 8 and 8' subunits by the 
present invention, their existence, let alone their interaction with x and a, was not 
even known. 

1 5 The A. aeolicus axSS'/p Pol III can be applied in several useful DNA 

handling techniques. For example, the thermophilic Pol III will be useful in DNA 
sequencing, especially at high temperature. Also, use of a thermal resistant rapid and 
processive Pol III is an important improvement to polymerase chain reaction 
technology. The ability of the A. aeolicus Pol III to extend primers for multiple 

20 kilobases makes possible the amplification of very long segments of DNA (long chain 
PCR). 

Another organism chosen for study and exposition herein is the 
extreme thermophile Thermotoga maritima. Thus, the present invention also relates 
to various isolated DNA molecules from Thermotoga maritima, in particular the DNA 

25 molecules encoding various replication proteins. These include dnaE, dnaX, dnaN, 
dnaQ, holA, holB, ssb DNA molecules from Thermotoga maritima. These DNA 
molecules can be inserted into an expression system or used to transform host cells 
from which isolated proteins can be obtained. The isolated proteins encoded by these 
DNA molecules are also disclosed. 

30 Unless otherwise indicated below, the Thermotoga maritima sequences 

were obtained by sequence comparisons using the Thermus thermophilus counterparts 
as query against the genome of Thermotoga maritima (Nelson et al., 1999). 
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The T. maritima dnaE gene has a nucleotide coding sequence 
according to SEQ. ID. No. 137 and encodes the a subunit of the of DNA Polymerase 
III, which has an amino acid sequence according to SEQ. ID. No. 138. The T.m. a 
subunit has approximately 33% aa identity to the T.th. a subunit. 
5 The T. maritima dnaQ gene has a nucleotide coding sequence 

according to SEQ. ID. No. 139 and encodes the 8 subunit of the of DNA Polymerase 
III, which has an amino acid sequence according to SEQ. ID. No. 140. The T.m. s 
subunit has approximately 34% aa identity to the T.th. s subunit. 

The T. maritima dnaXgene has a nucleotide coding sequence 
10 according to SEQ. ID. No. 141 and encodes the t subunit of the of DNA Polymerase 
III, which has an amino acid sequence according to SEQ. ID. No. 142. The T.m. x 
subunit has approximately 48% aa identity to the T.th. x subunit. 

The T. maritima dnaN gene has a nucleotide coding sequence 
according to SEQ. ID. No. 143 and encodes the p subunit of DNA Polymerase III, 
15 which has an amino acid sequence according to SEQ. ID. No. 144. The T.m. p 
subunit has approximately 28% aa identity to the T.th. p subunit. 

The T. maritima ssb gene has a nucleotide coding sequence according 
to SEQ. ID. No. 149 and encodes the SSB protein, which has an amino acid sequence 
according to SEQ. ID. No. 150. The T.m. SSB protein has approximately 18% aa 
20 identity to the T.th. SSB protein. 

Further, the coding sequences of T. maritima genes encoding the 
helicase (dnaE) and primase (dnaG) are also disclosed. The T. maritima dnaB gene 
has a nucleotide coding sequence according to SEQ. ID. No. 151 and encodes the 
DnaB protein, which functions as a helicase and has an amino acid sequence 
25 according to SEQ. ID. No. 152. The T. maritima dnaG gene has a nucleotide coding 
sequence according to SEQ. ID. No. 153 and encodes the DnaG protein, which 
functions as a primase and has an amino acid sequence according to SEQ. ID. No. 
154. 

The T. maritima holA and holB genes were previously unidentified by 
30 Nelson et al., 1999). Using the Thermits thermophilus 8 and 8' subunit amino acid 
sequences (SEQ. ID. Nos. 158 and 156, respectively) in separate BLAST searches 
(Altschul et al., 1997), corresponding polypeptide products in T. maritima were 
identified. The T. maritima holA gene has a nucleotide coding sequence according to 
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SEQ. ID. No. 145 and encodes the 5 subunit of the of DNA Polymerase III, which has 
an amino acid sequence according to SEQ. ID. No. 146. The T.m. 8 subunit has 
approximately 37% aa identity to the T.th. 5 subunit. The T.m. holB gene has a 
nucleotide coding sequence according to SEQ. ID. No. 147 and encodes the 8' subunit 
which has an amino acid sequence according to SEQ. ID. No. 148. The T.m. 8' 
subunit has approximately 25% aa identity to the T.th. 8' subunit. 

Yet another organism chosen for study and exposition herein is the 
extreme thermophile Bacillus stearothermophilus. Thus, the present invention also 
relates to various isolated DNA molecules from Bacillus stearothermophilus, in 
particular the DNA molecules encoding various replication proteins. These include 
dnaE, dnaX, dnaN, dnaQ, holA, holB, ssb DNA molecules from Bacillus 
stearothermophilus. These DNA molecules can be inserted into an expression system 
or used to transform host cells from which isolated proteins can be obtained. The 
isolated proteins encoded by these DNA molecules are also disclosed. 

Unless otherwise indicated below, the Bacillus stearothermophilus 
sequences were obtained by searching the database of this organism (at 
http://www.genome.ou.edu). 

The B. stearothermophilus polC gene has a nucleotide coding sequence 
according to SEQ. ID. No. 183 and encodes the PolC or a-large subunit of the DNA 
Polymerase III, which has an amino acid sequence according to SEQ. ID. No. 184. 
The B.st. PolC subunit, like the PolC submits of other Gram positive organisms, 
contains both polymerase and 3'-5' exonuclease activity. This subunit, therefore, is 
essentially a fusion of a and s. 

The B. stearothermophilus dnaXgene has a partial nucleotide coding 
sequence according to SEQ. ID. No. 181 and encodes the x subunit of the of DNA 
Polymerase III, which has a partial amino acid sequence according to SEQ. ID. 
No. 182. The B.st. x subunit has approximately 31% aa identity to the T.th. t subunit. 

The B. stearothermophilus dnaN gene has a partial nucleotide coding 
sequence according to SEQ. ID. No. 173 and encodes the (3 subunit of DNA 
Polymerase III, which has a partial amino acid sequence according to SEQ. ID. 
No. 174. The B.st. P subunit has approximately 21% aa identity to the T.th. p subunit. 

The B. stearothermophilus ssb gene has a nucleotide coding sequence 
according to SEQ. ID. No.175 and encodes the SSB protein, which has an amino acid 
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sequence according to SEQ. ID. No. 176. The B.st. SSB protein has approximately 
23% aa identity to the T.th. SSB protein. 

The B. stearothermophilus holA gene has a nucleotide coding sequence 
according to SEQ. ID. No. 177 and encodes the 5 subunit of DNA Polymerase III, 
5 which has an amino acid sequence according to SEQ. ID. No. 178. The B.st. 5 
subunit has approximately 26% aa identity to the T.th. 5 subunit. 

The B. stearothermophilus holB gene has a nucleotide coding sequence 
according to SEQ. ID. No. 179 and encodes the 5' subunit of DNA Polymerase III, 
which has an amino acid sequence according to SEQ. ID. No. 180. The B.st. 8' 
1 0 subunit has approximately 25% aa identity to the T. th. 5' subunit. 

By conducting BLAST searches of unidentified genomic DNA from 
other thermophilic eubacteria, it is possible to identify coding regions which encode 
various functional subunits of other Pol III replicative machinery. 

Although it is generally appreciated that proteins isolated from a 
1 5 thermophile should retain activity at high temperature, there is no guarantee that they 
will retain temperature resistance when isolated in pure form. This invention shows 
that the A. aeolicus Pol III, like the T. thermophilus Pol III, is resistant to high 
temperature. It is expected that the Th. maritima and B. stearothermophilus Pol III 
enzymes will similalry be resistant to high temperature. 
20 The following experiments illustrate the identification and 

characterization of the enzymes and constructs of the present invention. Accordingly, 
in Examples 1-8 below, the identification and expression of the y and t is presented, 
as the first step in the elucidation of the Thermus thermophilus Polymerase III 
reflective of the present invention. Examples 9-12 which follow set forth the protocol 
25 for the purification of the remainder of the sub-units of the enzyme that represent 
substantial entirety of the functional replicative machinery of the enzyme. 
Examples 18-30 demonstrate the preparation of isolated A. aeolicus sequences Pol III 
subunits and their thermostable use. 
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EXAMPLE 1 
EXPERIMENTAL PROCEDURES 

5 Materials 

DNA modification enzymes were from New England Biolabs. 
Labelled nucleotides were from Amersham, and unlabeled nucleotides were from 
New England Biolabs The Alter- 1 vector was from Promega. pET plasmids and E. 
coli strains, BL21(DE3) and BL21(DE3)pLysS were from Novagen. 
10 Oligonucleotides were from Operon. Buffer A is 20mM Tris-HCl (pH 7.5), 0.1 mM 
EDTA, 5mMDTT, and 10% glycerol. 

Genomic DNA 

Thermus thermophilus (strain HB8) was obtained from the American 
15 Type Tissue Collection. Genomic DNA was prepared from cells grown in 0.1 1 of 
Thermus medium N697 (ATCC: 4 g yeast extract, 8.0 g polypeptone (BBL 1 1910), 
2.0 g NaCl, 30.0 g agar, 1.0 L distilled water) at 75°C overnight. Cells were collected 
by centrifugation at 4°C and the cell pellet was resuspended in 25 ml of 100 mM 
Tris-HCl (pH 8.0), 0.05 M EDTA, 2 mg/ml lysozyme and incubated at room 
20 temperature for 10 min. Then 25 ml 0.10 M EDTA (pH 8.0), 6% SDS was added and 
mixed followed by 60 ml of phenol. The mixture was shaken for 40 min. followed by 
centrifugation at 10,000 X G for 10 min. at room temperature. The upper phase (50 
ml) was removed and mixed with 50 ml of phenol: chloroform (50:50 v/v) for 30 min. 
followed by centrifugation for 10 min. at room temperature. The upper phase was 
25 decanted and the DNA was precipitated upon addition of 1/1 0th volume 3 M sodium 
acetate (pH 6.5) and 1 volume ethanol. The precipitate was collected by 
centrifugation and washed twice with 2 ml of 80% ethanol, dried and resuspended in 
1 ml T.E. buffer (lOmM Tris Hcl (pH 7.5), ImM EDTA). 

30 Cloning of dnaX 

DNA oligonucleotides for amplification of T. th. genomic DNA were as 
follows. The upstream 32mer (5'-CGCAAGCTTCACGCSTACCTSTTCTCCGGSAC 
-3', S indicating a mixture of G and C) (SEQ. ID. No. 6) consists of a Hind III site 
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wifhin the first 9 nucleotides (underlined) followed by codons (SEQ. ID. No. 29) 
encoding the following amino acid sequence (HAYLFSGT) (SEQ. ID. No. 7). The 
downstream 34 mer (5'-CGCGAATTCGTGCTCSGGSGGCTCCTCSAGSGTC-3') 
(SEQ. ID. No. 8) consists of anEcoRI site (underlined) followed by codons (SEQ. ID. 

5 No. 30) encoding the sequence KTLEEPPEH (SEQ. ID. No. 9) on the complementary 
strand. The amplification reactions contained 10 ng T.th. genomic DNA, 0.5 mM of 
each primer, in a volume of 100 \i\ of Vent polymerase reaction mixture according to 
the manufacturers instructions (10 ^1 ThermoPol Buffer, 0.5 mM each dNTP and 0.5 
mM MgS0 4 ). Amplification was performed using the following cycling scheme: 5 

10 cycles of: 30 sec. at 95.5°C, 30 sec. at 40°C, 2 min. at 72°C; 5 cycles of: 30 sec. at 
95.5°C, 30 sec. at 45°C, and 2 min. at 72°C; and 30 cycles of: 30 sec. at 95.5°C, 30 
sec. at 50°C, and 30 sec. at 72°C. Products were visualized in a 1.5 % native agarose 
gel. 

Genomic DNA was digested with either Xhol, Xbal, StuI, PstI, Ncol, 
1 5 MM, Kpnl, Hindlll, EcoRI, EagI, Bgll, or BamHI, followed by Southern analysis in 
a native agarose gel (Maniatis et al., 1982). Approximately 0.5 |^g of digest was 
analyzed in each lane of a 0.8 % native agarose gel followed by transfer to an MSI 
filter (Micron Separations Inc.). The transfer included the following steps: 
1. The agarose gel was soaked in 500 ml of 1% HC1 with gentle shaking for 10 min. 
20 2. Then the gel was soaked in 500 ml of 0.5 M NaOH + 1 .5 M NaCl for 40 min. 

3. After that the gel was soaked in 500 ml of 1M ammonium acetate for 1 h. 

4. The DNA was transferred to the MSI filter with the use of blotting paper for 4 h. 

5. The filter was kept at 80°C for 15 min. in the oven. 

6. The pre-hybridization step was run in 1 0 ml of Hybridization solution ( 1 % 

25 crystalline BSA (fraction V) (Sigma), 1 mM EDTA, 0.5 M NaHP04 (pH 7.2), 7% 
SDS) at 65°C for 30 min. 

7. The probe, radiolabeled by the random priming method (see below), was added to 
the pre-hybridization solution and kept at 65°C for 12 h. 

8. The filter was washed with low stringency with 200 ml of the wash buffer (0.5% 
30 BSA, fractionV), ImM Na2EDTA, 40 mM NaHP04 (pH 7.2), 5% SDS with gentle 

shaking for 20 min. This step was repeated 5 times, followed by exposure to X-ray 
film (XAR-5, Kodak). 

As a probe, the PCR product was radiolabeled by random as follows. 
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1 . 1 4 ml of the mixture containing 0.2 ug of PCR product DNA, 1 ug of the pd(N6) 
(Promega) and 2.5 ml of the 10X Klenow reaction buffer (100 mM Tris-HCl (pH 7.5), 
50 mM MgCl 2 , 75 mM dithiothreitol) were boiled for 10 min. and then kept at 4°C. 

2. The reaction volume was increased up to 25 ul, containing in addition 33 uM of 
each dNTP, except dATP, 10 uCi [a- 32 P] dATP (800 Ci/mM), and 2 units of Klenow 
enzyme. The reaction mixture was incubated 1 .5 h. 

3. 2 mg of sonicated herring sperm DNA (GibcoBRL) was added to the reaction and 
the volume was increased to 2 ml using hybridization solution. The sample was then 
boiled for 1 0 min. 

A genomic library of Xbal digested DNA was prepared upon treating 1 
Ug genomic T.th DNA with 10 units of Xbal in 100 ul of NEBuffer N2 (50 mM 
NaCl, 10 mM Tris-HCl (pH 7.9), 10 mM MgC12, 1 mM DTT) for 2 h at 37°C. The 
digested DNA was purified by phenol chloroform extraction and ethanol 
precipitation. The Alter-1 vector (0.5 ug)(Promega) was digested with 1 unit of Xbal 
in NEBuffer N2 and then purified by phenol/chloroform extraction and ethanol 
precipitation. One microgram of genomic digest was incubated with 0.05 ug of 
digested Alter-1 and 20 U of T4 ligase in 30 ul of ligase buffer (50 mM Tris-HCl (pH 
7.8), 10 mM MgC12, 10 mM DTT and 1 mM ATP) at 15°C for 12 h. The ligation 
reaction was transformed into the DH5a strain of E. coli and trans formants were 
plated on LB plates containing ampicillin and screened for the dnaX insert using the 
radiolabeled PCR probe as follows: 

1. The colonies tested were lifted onto MSI filters, approximately 100 colonies to 
each filter. 

2. The filters, removed from the LB/Tc plates, were placed side up on a sheet of 
Whatman 3 MM paper soaked with 0.5 M NaOH for 5 min. 

3. The filters were transferred to a sheet of paper soaked with 1 M Tris-HCl (pH 7.5) 
for 5 min. 

4. The filters were placed on a sheet of paper soaked in 0.5 M Tris-HCl (pH 7.5), 
1.25 M NaCl for 5 min. 

5. After drying by air, the filters were heated in the oven 80° C for 15 min. and then 
were analyzed by Southern hybridization. 

Plasmid DNA was prepared from 20 positive colonies; of these 6 contained the 
expected 4 kb insert when digested with Xbal. Sequencing of the insert was 
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performed by the Sanger method using the Vent polymerase sequencing kit according 
to the manufacturers instructions (New England Biolabs). 

Identification of the dnaX gene 

The dnaX genes of the gram negative E. coli and the gram positive B. 
subtilis share more than 50% identity in amino acid sequence within the N-terminal 
180 residues containing the ATP-binding domain (Fig. 2). Two highly conserved 
regions (shown in bold in Fig. 2) were used to design oligonucleotide primers for 
application of the polymerase chain reaction to T.th. genomic DNA. The expected 
PCR product, including the restriction sites (i.e. before cutting) is 345 nucleotides. 
Use of these primers with genomic T.th. DNA resulted in a product of the expected 
size. The PCR product was then radiolabeled and used to probe genomic DNA in a 
Southern analysis (Fig. 3). Genomic DNA was digested with several different 
restriction endonucleases, electrophoresed in a native agarose gel and then probed 
with the PCR fragment. The Southern analysis showed an Xbal fragment of 
approximately 4 kb, more than sufficient length to encode the dnaX gene. Other 
restriction nucleases produced fragments that were significantly longer, or produced 
two or more fragments indicating presence of a site within the coding sequence of 
dnaX. 

To obtain full length dnaX, genomic DNA was digested with Xbal and 
ligated into Xbal digested Alter- 1 vector. Ligated DNA was transformed into DH5 
alpha cells, and colonies were screened with the labeled PCR probe. Plasmid DNA 
was prepared from 20 positive colonies and analyzed for the appropriate sized insert 
using Xbal. Six of the twenty clones contained the expected 4 kb Xbal fragment as 
an insert, the sequence of which is shown in Figs. 4A and 4B. 

The frameshift site 

The dnaX gene of E. coli produces two proteins, the y and t subunits, 
by a -1 frameshift (Tsuchihashi and Kornberg, 1990; Flower and McHenry, 1990; 
Blinkowa and Walker, 1990). The full length product yields x, and the frameshift 
results in addition of one amino acid before encountering a stop codon to produce y. 
The -1 frameshift site in the E. coli dnaX gene contains the sequence, A AAA AAG, 
which follows the X XXY YYZ rule found in retroviral genes (Jacks et al., 1988). 
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This "slippery sequence" preserves the initial two residues of the tRNAs in the 
aminoacyl and peptidyl sites both before and after the frameshift. Mutagenesis of the 
E. coli dnaX frameshifting site has shown that the first three residues can be 
nucleotides other than A, but that A's in the second set of three nucleotides is 
important to frameshifting (Tsuchihashi and Brown, 1 992). 

Immediately downstream of the stop codon is a potential stem-loop 
structure which enhances frameshifting, presumably by causing the ribosome to 
pause. Further, the AAG codon lacks a cognate tRNA in E. coli and thus the G 
residue may facilitate the pause, and has been shown to aid the vigorous frameshifting 
observed in the E. coli dnaX gene (Tsuchihashi and Brown, 1992). A fourth 
component of frameshifting in the E. coli dnaX gene is presence of an upstream 
Shine-Dalgarno sequence which is thought to pair with the 16S rRNA to increase the 
frequency of frameshifting still further (Larsen et al., 1994). 

Examination of the T.th. dnaX sequence reveals a single site that 
fulfills the X XXY YYZ rule in which positions 4-7 are A residues. The site is unique 
from that in E. coli as all seven residues are A, and the heptanucleotide sequence is 
flanked by another A residue on each side (i.e. A9). Surprisingly, the stop codon 
immediately downstream of this site is in the -2 frame, although there is a stop codon 
in the -1 frame 28 nucleotides downstream of the -2 stop codon. Indeed, a -2 
frameshift would fulfill the requirement that the first two nucleotides of each codon in 
the peptidyl and aminoacyl sites be conserved during either a -1 or a -2 frameshift. 
As with the case of E. coli dnaX, there are secondary structure step loop structures 
immediately downstream. Finally, there is a Shine-Dalgarno sequence immediately 
adjacent to the frameshift site, as well as another Shine-Dalgarno sequence 22 
nucleotides upstream of the frameshift site. 

Assuming the first stop codon is utilized (i.e. -2 frameshift), the 
predicted size of the y subunit in T.th. is 454 amino acids for a mass of 49.8 kDa, over 
2 kDa larger than the 43 1 residue y subunit (47.5 kDa) of E. coli. This would result in 
2 residues after the -2 frameshift (i.e. after the GluLysLys, the residues LysAla would 
be added) to be compared to the result of the -1 frameshift in E. coli which also results 
in 2 residues (LysGlu). In the event that a -1 frameshift were utilized in the T.th. 
dnaX gene, then an additional 12 residues would be added following the frameshift 
for a molecular mass of 50.8 kDa (i.e. after the GluLysLys, the residues 
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LysProAspProLysAlaProProGlyProThrSer would be added at aa 453-464 of SEQ. ID. 
No. 4). As explained later, this nucleotide sequence was found to promote both -1 
and -2 frameshifting in E. coli (Fig. 8). But first, we examined T.th. cells by Western 
analysis for the presence of two subunits homologous to E. coli y and x. 

5 

EXAMPLE 2 

Frameshifting analysis of the T. th. dnaX gene 

Frameshifting was analyzed by inserting the frameshift site into lacZ in 
10 the three different reading frames, followed by plating on X-gal and scoring for blue 
or white colony formation (Weiss et al, 1987). The frameshifting region within T.th 
dnaX was subcloned into the EcoRI/BamHI sites of pUC19. These sites are within 
the polylinker inside of the 6-galactosidase gene. Three constructs were produced 
such that the insert was either in frame with the downstream coding sequence of 
1 5 B-galactosidase, or were out of frame (either -1 or -2). An additional three constructs 
were designed by mutating the frameshift sequence and then placing this insert into 
the three reading frames of the B-galactosidase gene. These six plasmids were 
constructed as described below. 

The upstream primer for the shifty sequences was 5'-gcg egg ate egg 
20 agg gag aaa aaa aaa gec tea gec ca-3 r (SEQ. ID. No. 10). The BamHI site for cloning 
into pUC is underlined. Also, the stop codon, tga, has been mutated to tea (also 
underlined). The upstream primer for the mutant shifty sequence was: 5'-gcg egg ate 
egg agg gag aga aga aaa gec tea gec ca-3' (SEQ. ID. No. 11). The mutant sequence 
contains two substitutions of a G for an A residue in the polyA stretch (underlined). 
25 Three downstream primers were utilized with each upstream primer to create two sets 
of three inserts in the 0 frame, -1 frame and -2 frame. The sequence of these primers, 
and the length of insert (after cutting with EcoRI and BanHI and inserting into 
pUC19) are as follows: 5'-gaa tta aat teg cgc ttc ggg agg tgg g-3' (0 frameshift, total 
58 nucleotide insert) (SEQ. ID. No. 12); 5'-gcg cga att cgc get teg gga ggt ggg-3' (-1 
30 frame, 54mer insert) (SEQ. ID. No. 13); and 5'-gcg cga att egg gcg ctt cag gag gtg 
gg-3' (-2 frame, 56mer insert) (SEQ. ID. No. 14). The downstream primers have an 
EcoRI site (underlined); the EcoRI site of the 0 frame insert was blunt ended to 
produce the greater length insert (converting the EcoRI site to an aattaatt sequence). 
Also, the teg sequence, which produces the tga stop codon (underlined) was mutated 
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to tea in the -2 downstream primer so that readthrough would be allowed after the 
frameshift occurred. 

In summary, a region surrounding the frameshift site and ending at 
least 5 nucleotides past the -1 frameshift stop codon was inserted into the B- 

5 galactosidase gene of pUC19 in the three different reading frames (stop codons were 
mutated to prevent stoppage following a frameshift). These three plasmids were 
introduced into E. coli and plated with X-gal. The results, in Fig. 8, show that blue 
colonies were observed after 24 h incubation with all three plasmids and therefore 
both -1 and -2 frameshifting had occurred. 

10 To further these results, two y residues were introduced into the polyA 

tract which should disrupt the ability of this sequence to direct frameshifts. The 
mutated slippery sequence was inserted into pUC19 followed by transformation into 
E. coli and plating on X-gal. The results showed that both -1 and -2 frameshifting 
was prevented, further supporting the fact that frameshifting requires the polyA tract 

15 as expected (Fig. 8). 

EXAMPLE 3 

Expression vector for T. th. y and x 

20 The dnaX gene was cloned into the pETl 6 expression vector in the 

steps shown in Fig. 9. First, the bulk of the gene was cloned into pET16 by removing 
the Pmll/Xbal fragment from pAlterdnaX, and placing it into Smal/Xbal digested 
Pucl9 to yield Pucl9dnaXCterm. The N-terminal sequence of the dnaX gene was 
then reconstructed to position an Ndel site at the N-terminus. This was performed by 

25 amplifying the 5' region encoding the N-terminal section of ylx using an upstream 
primer containing an Ndel site that hybridizes to the dnaX gene at the initiating gtg 
codon (i.e. to encode Met where the Met is created by the PCR primer, and the Val is 
the initiating gtg start codon of dnaX). The primer sequence for this 5' end was: 
5'-gtggtgcatatg gtg age gec etc tac cgc c-3' (SEQ. ID. No. 15) (where the Ndel site is 

30 underlined, and the coding sequence of dnaX follows). The downstream primer 
hybridizes past the Pmll site at nucleotide positions 987 - 1004 downstream of the 
initiating gtg (primer sequence: 5'-gtggtggtcgac cca gga ggg cca cct cca g-3' (SEQ. 
ID. No. 16) where the initial 12 nucleotides contain a SalGI restriction site, followed 
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by the sequence from the region downstream the stop codon). The 1.1 kb nucleotide 
PCR product was digested with Pmll/Ndel and the Pmll/Ndel fragment was ligated 
into Ndel/Pmll digested Pucl9dnaXCterm to form Pucl9dnaX. The Pucl9dnaX 
plasmid was then digested with Ndel and Sail and the 1.9 kb fragment containing the 
5 dnaX gene was purified using the Sephaglas BandPrep Kit (Pharmacia-LKB). 
pET16b was digested with Ndel and Xhol. Then the full length dnaX gene was 
ligated into the digested pET16b to form pETdnaX. 

EXAMPLE 4 

10 

Expression of T. th. y and x 

As discussed in the previous example, the dnaX gene was engineered 
into the T7 based IPTG inducible pET16 vector such that the initiation codon was 
placed precisely following the Met residue N-terminal leader sequence (Fig. 9). This 

15 should produce a protein containing the entire sequence of y and x, along with a 21 

residue leader containing 10 contiguous His residues (tagged-x = 60.6 kDa; tagged-y 
= 52.4 kDa for -2 frameshift). The pETdnaX plasmid was introduced into 
BL21(DE3)pLysS cells harboring the gene encoding T7 RNA polymerase under 
control of the lac repressor. Log phase cells were induced with IPTG and analyzed 

20 before and after induction in an SDS polyacrylamide gel (Fig. 10, lanes 1 and 2). The 
result shows that upon induction, two new proteins are expressed with the 
approximate sizes expected of the T.th. y and x subunits (larger than E. coli y, and 
smaller than E. coli x). The two proteins are produced in nearly equal amounts, 
similar to the case of the E. coli y and x subunits. Western analysis using antibodies 

25 against the E. coli y and x subunits cross-reacted with the induced proteins further 
supporting their identity as T.th. y and x (data not shown, but repeated with the pure 
subunits shown in Fig. 10, lane 6). 

EXAMPLE 5 

30 



Purification of T.th. y and x 

The His-tagged T.th. y and x proteins were purified from 6 L of 
induced E. coli cells containing the pETdnaX plasmid. Cells were lysgd, clarified 
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from cell debris by centrifugation and the supernatant was applied to a HiTrap chelate 
affinity column. Elution of the chelate affinity column yielded approximately 35 mg 
of protein in which the two predominant bands migrated in a region consistent with 
the molecular weight predicted from the dnaX gene (Fig. 10, lane 3), and produced a 
5 positive signal by Western analysis using polyclonal antibody directed against the E. 
coli y and x subunits (lane 4). The y and x subunits are present in nearly equal 
amounts consistent with the nearly equal expression of these proteins in E. coli cells 
harboring the pETJ^oXplasmid. 

The y and t subunits were further purified by gel filtration on a 
10 Superose 12 column (Fig. 10, lane 4; Fig. 1 1). Recovery of T.th. y and x subunits 

through gel filtration was 81%. The E. coli y and x subunits, when separated from one 
another, elute during gel filtration as tetramers. A mixture of E. coli y/x results in a 
mixed tetramer of y2x2 along with y4 and x4 tetramers (Onrust et al., 1995). The 
mixture of T.th. y/x elutes ahead of the 150 kDa marker, and thus is consistent with 
1 5 the expected mass of a y2x2 tetramer (225 kDa) and y4 and x4 tetramers. 

As described earlier, the dnaX frameshifting sequence could produce 
either a -1 or -2 framehift to yield a His-tagged y subunit of mass either 53.3 kDa or 
52.4 kDa, respectively. The difference in these two possible products is too close to 
determine from migration in SDS gels. It also remains possible that two y products 
20 are present and do not resolve under the conditions used. The exact protocol for this 
purification is described below. 

Six liters of BL21(DE3)pLysSpET<2waA r cells were grown in LB media 
containing 50 u-g/ml ampicillin and 25 ng/ml chloramphenicol at 37°C to an O.D. of 
0.8 and then IPTG was added to a concentration of 2 mM. After a further 2 h at 37°C, 
25 cells were harvested by centrifugation and stored at -70°C. The following steps were 
performed at 4°C. Cells (15 g wet weight) were thawed and resuspended in 45 ml IX 
binding buffer (5 mM imidizole, 0.5 M NaCl, 20 mM Tris HC1 (final pH 7.5)) using a 
dounce homogenizer to complete cell lysis and 450 ml of 5% polyamine P (Sigma) 
was added. Cell debris was removed by centrifugation at 18,000 rpm for 30 min. in a 
30 Sorvall SS24 rotor at 4°C. The supernatant (Fraction I, 40 ml, 376 mg protein) was 
applied to a 5 ml HiTrap Chelating Separose column (Pharmacia-LKB). The column 
was washed with 25 ml of binding buffer, then with 30 ml of binding buffer 
containing 60 mM imidizole, and then eluted with 30 ml of 0.5 M imidizole, 0.5 M 
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NaCl, 20 mM Tris-HCl (pH 7.5). Fractions of 1 ml were collected and analyzed on 
an 8% Coomassie Blue stained SDS polyacrylamide gel. Fractions containing 
subunits migrating at the T.th y and x positions, and exhibiting cross reactivity with 
antibody to E. coli y and x in a Western analysis, were pooled and dialyzed against 

5 buffer A (20 mM Tris-HCl (pH 7.5), 0.1 mM EDTA, 5 mM DTT and 1 0% glycerol) 
containing 0.5 M NaCl (Fraction II, 36 mg in 7 ml). Fraction II was diluted 2-fold 
with buffer A and passed through a 2 ml ATP agarose column equilibrated in buffer A 
containing 0.2 M NaCl to remove any E. coli y complex contaminant. Then 0.1 8 mg 
(300 ml) Fraction II was gel filtered on a 24 ml Superose 12 column 

1 0 (Pharmacia-LKB) in buffer A containing 0.5 M NaCl. After the first 2 1 6 drops, 

fractions of 200 ul were collected (Fraction III) and analyzed by Western analysis (by 
procedures similar to those described in Example 6), by ATPase assays and by 
Coomassie Blue staining of an 8% Coomassie Blue stained SDS polyacrylamide gel. 
The Coomassie stained gels and Western analysis of recombinant T.th. gamma and 

1 5 tau for these purification steps are summarized in Fig. 1 0. 

EXAMPLE 6 

Western Analysis of T.th. cells for presence of y and x subunits 
20 Polyclonal antibody to E. coli y/x - E. coli y subunit was prepared as 

described (Studwell-Vaughan and O'Donnell, 1991). Pure y subunit (100 ug) was 
brought up in Freund's adjuvant and injected subcutaneously into a New Zealand 
Rabbit (Poccono Rabbit Farms). After two weeks, a booster consisting of 50 ug y in 
Freund's adjuvant was administered, followed after two weeks by a third injection (50 
25 ug). 

The homology between the amino terminal regions of T.th. and E. coli 
y/x subunits suggested that there may be some epitopes in common between them. 
Hence, polyclonal antibody directed against the E. coli y/x subunits was raised in 
rabbits for use in probing T.th. cells by Western analysis. Fig. 7 shows the results of a 
30 Western analysis of whole T.th. cells lysed in SDS. The results show that in T.th. 

cells, the antibody is rather specific for two high molecular proteins which migrate in 
the vicinity of the molecular masses of E. coli y and x subunits. 
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Procedure for Western Analysis 

Samples were analyzed in duplicate 10 % SDS polyacrylamide gels by 
the Western method (Towbin et al. 1979). One gel was Coomassie stained to evaluate 
the pattern of proteins present, and the other gel was then electroblotted onto a 

5 nitrocellulose membrane (Schleicher and Schuell). For molecular size markers, the 
kaliedoscope molecular weight markers (Bio-Rad) were used to verify by 
visualization that transfer of proteins onto the blotted membrane had occurred. The 
gel used in electroblotting was also stained after electroblotting to confirm that 
efficient transfer of protein had occured. Membranes were blocked using 5% non-fat 

1 0 milk, washed with 0.05% Tween in TBS (TBS-T) and then incubated for over 1 h 

with a 1/5000 dilution of rabbit polyclonal antibody directed against E. coli y and x in 
1 % gelatin in TBS-T at room temperature. Membranes were washed using TBS-T 
buffer and then antibody was detected on X-ray film (Kodak) by using the ECL kit 
from (Amersham) and the manufactures reccommended procedures. 

1 5 Samples included : 1 ) a mixture of E. coli y (15 ng) and x ( 1 5 ng) 

subunits; 2) T.th. whole cells (100 ul) suspended in cracking buffer; and 3) purified 
T.th. y and x fraction II (0.6 \x.g as a mixture). 

EXAMPLE 7 

20 

Characterization of the ATPase Activity of y/x 

The E. coli x subunit is a DNA dependent ATPase (Lee and Walker, 
1987; Tsuchihashi and Kornberg, 1989). The y subunit binds ATP but does not 
hydro lyze it even in the presence of DNA unless other subunits of the DNA 

25 polymerase III holoenzyme are also present (Onrust et al., 1991). Next we examined 
the T.th. y/x subunits for DNA dependent ATPase activity. The y/x preparation was, 
in fact, a DNA stimulated ATPase (Fig. 1 1, top panel). The specific activity of the 
T.th. y/x was 11.5 mol ATP hydrolyzed/mol y/x (as monomer and assuming an equal 
mixture of the two). Furthermore, analysis of the gel filtration column fractions 

30 shows that the ATPase activity coelutes with the T. th. y/x subunits, supporting 

evidence that the weak ATPase activity is intrinsic to the y/x subunits (Fig. 11). The 
specific activity of the y/x preparation before gel filtration was the same as after gel 
filtration (within 10%), further indicating that the DNA stimulated ATPase is an 
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inherent activity of the y/x subunits. Presumably, only the x subunit contains ATPase 
activity, as in the case of E. coli. Assuming only T.th. x contains ATPase activity, its 
specific activity is twice the observed rate (after factoring out the weight of y). This 
rate is still only one-fifth that of E. coli x. 
5 The T.th. y/x ATPase activity is lower at 37°C than at 65°C (middle 

panel), consistent with the expected behavior of protein activity from a thermophilic 
source. However, there is no apparent increase in activity in proceeding from 50°C to 
65 °C (the rapid breakdown of ATP above 65 °C precluded measurement of ATPase 
activity at temperatures above 65°C). In contrast, the E. coli x subunit lost most of its 

10 ATPase activity upon elevating the temperature to 50°C (middle panel). These 

reactions contain no stabilizers such as a nonionic detergent or gelatin, nor did they 
include substrates such as ATP, DNA or magnesium. 

Last, the relative stability of T. th. y/x and E. coli y/x to addition of 
NaCl (Fig. 12, bottom panel) was examined. Whereas the E. coli x subunit rapidly 

1 5 lost activity at even 0.2 M NaCl, the T. th. y/x retained full activity in 1 .0 M NaCl and 
was still 80 % active in 1 .5 M NaCl. The detailed procedure for the ATPase activity 
assay is described below. 

ATPase assays 

20 ATPase assays were performed in 20 ul of 20 mM Tris-HCl (pH 7.5), 

8 mM MgCl 2 containing 0.72 ug of M13mpl8 ssDNA (where indicated), 100 mM 
[y- 32 P]-ATP (specific activity of 2000-4000 cpm/pmol), and the indicated protein. 
Some reactions contained additional NaCl where indicated. Reactions were incubated 
at the temperatures indicated in the figure legends for 30 min. and then were 

25 quenched with an equal volume of 25 mM EDTA (final). The aliquots were analyzed 
by spotting them (1 ul each) onto thin layer chromatography (TLC) sheets coated with 
Cel-300 polyethyleneimine (Brinkmann Instruments Co.). TLC sheets were 
developed in 0.5 M lithium chloride, 1 M formic acid. An autoradiogram of the TLC 
chromatogram was used to visualize Pi at the solvent front and ATP near the origin 

30 which were then cut from the TLC sheet and quantitated by liquid scintillation. The 
extent of ATP hydrolyzed was used to calculate the mol of Pi released per mol of 
protein per min. One mol of E. coli x was calculated assuming a mass of 7 1 kDa per 
monomer. The T. th. y and x preparation was treated as an equal mixture and thus one 
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mole of protein as monomer was the average of the predicted masses of the y and x 
subunits (54 kDa). 

EXAMPLE 8 

5 

Homolog of T.th. y/x to dnaXgene products of other organism 

The Xbal insert encoded an open reading frame, starting with a GTG 
codon, of 529 amino acids in length (58.0 kDa), closer to the predicted length of the 
B. subtilis t subunit (563 amino acids, 62.7 kDa mass)(Alonso et al., 1990) than the E. 

10 coli x subunit (71.1 kDa)(Yin et al., 1986). The dnaX gene encoding the y/x subunits 
of E. coli DNA polymerase III holoenzyme is homologous to the holB gene encoding 
the §' subunit of the y complex clamp loader, and this homology extends to all 5 
subunits of the eukaryotic RFC clamp loader as well as the bacteriophage gene 
protein 44 of the gp44/62 clamp loading complex (O'Donnell et al., 1993). These 

1 5 gene products show greatest homology over the N-terminal 1 66 amino acid residues 
(of E. coli dnaX); the C-terminal regions are more divergent. Fig. 4 shows an 
alignment of the amino acid sequence of the N-terminal regions of the T.th. dnaX 
gene product to those of several other bacteria. The consensus GXXGXGKT (SEQ. 
ID. No. 17) motif for nucleotide binding is conserved in all these protein products. 

20 Further, the E. coli 5' crystal structure reveals one atom of zinc coordinated to four 
Cys residues (Guenther, 1996). These four Cys residues are conserved in the E. coli 
dnaX gene, and the y and t subunits encoded by E. coli dnaX bind one atom of zinc. 
These Cys residues are also conserved in T.th. dnaX (shown in Fig. 4). Overall, the 
level of amino acid identity relative to E. coli dnaX in the N-terminal 165 residues of 

25 T.th. dnaXis 53 %. The T.th. dnaXgene is just as homologous to the B. subtilis dnaX 
(53 % identity) gene relative to E. coli dnaX. After this region of homology, the 
C-terminal region of T.th. dnaX shares 26% and 20% identity to E. coli and B. subtilis 
dnaX, respectively. A proline rich region, downstream of the conserved region, is 
also present in T.th. dnaX (residues 346-375), but not in the B. subtilis dnaX(see Figs. 

30 3A and 3B). The overall identity between E. coli dnaX and T.th. dnaXavex the entire 
gene is 34%. Identity of T.th. dnaXXo B. subtilis dnaX over the entire gene is 28%. 
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Comparison of dnaX genes from T. th. and E. coli 

The above identifies a homologue of the dnaX gene of E. coli in 
Thermus thermophilics. Like the E. coli gene, T.th. dnaX encodes two related proteins 
through use of a highly efficient translational frameshift. The T.th. y/x subunits are 
5 tetramers, or mixed tetramers, similar to the y and x subunits of E. coli. Further, the 
y/x subunit is a DNA stimulated ATPase like its E. coli counterpart. As expected for 
proteins from a thermophile, the T.th. y/x ATPase activity is thermo stabile and 
resistant to added salt. 

In E. coli, y is a component of the clamp loader, and the x subunit 

1 0 serves the function of holding the clamp loading apparatus together with two DNA 
polymerases for coordinated replication of duplex DNA. The presence of y in T.th. 
suggests it has a clamp loading apparatus and thus a clamp as well. The presence of 
the x subunit of T.th. implies that T.th. contains a replicative polymerase with a 
structure similar to that of E. coli DNA polymerase III holoenzyme. 

1 5 A significant difference between E. coli and T.th. dnaX genes is in the 

translational frameshift sequence. In E. coli, the heptamer frameshift site contains six 
A residues followed by a G residue in the context A AAA AAG. This sequence 
satisfies the X XXY YYZ rule for -1 frameshifting. The frameshift is made more 
efficient by the absence of the AAG tRNA for Lys which presumably leads to stalling 

20 of the ribosome at the frameshift site and increases the efficiency of frameshifting 
(Tsuchihashi and Brown, 1992). Two additional aids to frameshifting include a 
downstream hairpin and an upstream Shine-Dalgarno sequence (Tsuchihashi and 
Kornberg, 1990; Larsen et al., 1994). The -1 frameshift leads to incorporation of one 
unique residue at the C-terminus of E. coli y before encounter with a stop codon. 

25 In T.th., the dnaX frameshifting heptamer is A AAA AAA, and it is 

flanked by two other A residues, one on each side. There is also a downstream region 
of secondary structure. The nearest downstream stop codon is positioned such that 
gamma would contain only one unique amino acid, as in E. coli. However, the T.th. 
stop codon is in the -2 reading frame thus requires a -2 frameshift. No precedent 

30 exists in nature for -2 frameshifting, although -2 frameshifting has been shown to 
occur in test cases (Weiss et al., 1987). In vivo analysis of the T.th. frameshift 
sequence shows that this natural sequence promotes both -1 and -2 frameshifting in E. 
coli. Whereas the -2 frameshift results in only one unique C-terminal residue, a -1 
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frameshift would result in an extension of 12 C-terminal residues. At present, the 
results do not discriminate which path occurs in T.th, a -1 or -2 frameshift, or a 
combination of the two. 

There are two Shine-Dalgarno sequences just upstream of the 
5 frameshift site in T.th. dnaX. In two cases of frameshifting in E. coli, an upstream 
Shine-Dalgarno sequence has been shown to stimulate frameshifting (reviewed in 
Weiss et al., 1897). In release factor 2 (RF2), the Shine-Dalgarno is 3 nucleotides 
upstream of the shift site, and it stimulates a +1 frameshift event. In the case of E. 
coli dnaX, a Shine-Dalgarno sequence 10 nucleotides upstream of the shift sequence 

10 stimulates the -1 frameshift. One of the T.th. dnaX Shine-Dalgarno sequences is 

immediately adjacent to the frameshift sequence with no extra space, the other is 22 
residues upstream of the frameshift site. Which of these Shine-Dalgarno sequences 
plays a role in T.th. dnaX frameshifting, if any, will require future study. 

In E. coli, efficient separation of the two polypeptides, y and x, is 

1 5 achieved by mutation of the frameshift site such that only one polypeptide is produced 
from the gene (Tsuchihashi and Kornberg, 1990). Substitution of G-to-A in two 
positions of the heptamer of T.th. dnaX eliminates frameshifting and thus should be a 
source to obtain x subunit free of y. To produce pure y subunit free of x, the 
frameshifting site and sequence immediately downstream of it can be substituted for 

20 an in-frame sequence with a stop codon. 

Examination of the B. subtilis dnaX gene shows no frameshift 
sequence that satisfies the X XX Y YYZ rule. Hence, it would appear that dnaX does 
not make two proteins in this gram positive organism. 

Rapid thermal motions associated with high temperature may make 

25 coordination of complicated processes more difficult. It seems possible that 
organizing the components of the replication apparatus may become yet more 
important at higher temperature. Hence, production of a x subunit that could be used 
to crosslink two polymerases and a clamp loader into one organized particle may be 
most useful at elevated temperature. 

30 As stated above, the following examples describe the continued 

isolation and purification of the substantial entirety of the Polymerase III from the 
extreme thermophile Thermus thermophilics. It is to be understood that the following 
exposition is reflective of the protocol and characteristics, both morphological and 
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functional, of the Polymerase Ill-type enzymes that are the focus of the present 
invention, and that the invention is hereby illustrated and comprehends the entire class 
of enzymes of thermophilic origin. 

5 EXAMPLE 9 

Purification of the Thermus thermophilics DNA polymerase HI 

All steps in the purification assay were performed at 4°C. The 
following assay was used in the purification of DNA polymerase from T.th. cell 

10 extracts. Assays contained 2.5 mg activated calf thymus DNA (Sigma Chemical 
Company) in a final volume of 25 ml of 20 mM Tris-Cl (pH 7.5), 8 mM MgCl 2 , 5 
mM DTT, 0.5 mM EDTA, 40 mg/ml BSA, 4% glycerol, 0.5 mM ATP, 3 mM each 
dCTP, dGTP, dATP, and 20 mM [a- 32 P]dTTP. An aliquot of the fraction to be 
assayed was added to the assay mixture on ice followed by incubation at 60°C for 5 

15 min. DNA synthesis was quantitated using DE81 paper followed by washing off 
unincorporated nucleotide. Incorporated nucleotide was determined by scintillation 
counting of the filters. 

Thermus thermophilus cell extracts were prepared by suspending 35 
grams of cell paste in 200 ml of 50 mM TRIS-HC1, pH=7.5, 30 mM spermidine, 100 

20 mM NaCl, 0.5 mM EDTA, 5 mM DTT, 5% glycerol, followed by disruption by 
passage through a French pressure cell (15,000 PSI). Cell debris was removed by 
centrifugation (12,000 RPM, 60 min). DNA polymerase III in the clarified 
supernatant was precipitated by treatment with ammonium sulphate (0.226 gm/liter) 
and recovered by centrifugation. This fraction was then backwashed with the same 

25 buffer (but lacking spermidine) containing 0.20 gm/1 ammonium sulfate. The pellet 
was then resuspended in buffer A and dialyzed overnight against 2 liters of buffer A; 
a precipitate which formed during dialysis was removed by centrifugation (17,000 
RPM, 20 min). 

The clarified dialysis supernatant, containing approximately 336 mg of 
30 protein, was applied onto a 60 ml heparin agarose column equilibrated in buffer A 
which was washed with the same buffer until A280 reached baseline. The column 
was developed with a 500 ml linear gradient of buffer A from 0 to 500 mM NaCl. 
More tightly adhered proteins were washed off the column by treatment with buffer A 
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(20 mM Tris Hcl, pH = 7.5, 0.1 mM EDTA, 5mM DTT, and 10% glycerol) and 1M 
NaCL Some DNA polymerase activity flowed through the column. Two peaks 
(HEP.P1 and HEP.P2) of DNA polymerase activity eluted from the heparin agarose 
column containing 20 mg and 2 mg of total protein respectively (Fig. 13 A). These 
5 were kept separate throughout the remainder of the purification protocol. 

The Pol III resided in HEP.P1 as indicated by the following criteria: 
1) Western analysis using antibody directed against the a subunit of E. coli Pol III 
indicated presence of Pol III in HEP.P1 ; 2) Only the HEP .PI fraction was capable of 
extending a single primer around an M13mpl8 7.2 kb ssDNA circle (explained later 

10 in Example 16), such long primer extension being a characteristic of Pol III type 
enzymes; and 3) Only the HEP.P1 provided DNA polymerase activity that was 
retained on an ATP-agarose affinity column, which is indicative of a Pol Ill-type 
DNA polymerase since the y and x subunits are ATP interactive proteins. 

The first peak of the heparin agarose column (HEP.P1: 20 mg in 127.5 

1 5 ml) was dialyzed against buffer A and applied onto a 2ml N6-linkage ATP agarose 

column pre-equilibrated in the same buffer. Bound protein was eluted by a slow (0.05 
ml/min) wash with buffer A + 2M NaCl and collected into 200 ul fractions. 
Chromatography of peak HEP.P1 yielded a flow-through (HEP.P1-ATP-FT) and a 
bound fraction (HEP.Pl-ATP-Bound) (Fig. 13B). Binding of peak HEP.P2 to the 

20 ATP column could not be detected, though DNA polymerase activity was recovered 
in the flow-through. 

The HEP.Pl-ATP-Bound fractions from the ATP agarose 
chromatographic step were further purified by anion exchange over monoQ. The 
HEP.Pl-ATP-Bound fractions were diluted with buffer A to approximately the 

25 conductivity of buffer A plus 25 mM NaCl and applied to a 1ml monoQ column 

equilibrated in Buffer A. DNA polymerase activity eluted in the flow-through and in 
two resolved chromatographic peaks (MONOQ peakl and peak2) (Fig. 13C). Peak 2 
was by far the major source of DNA polymerase activity. Western analysis using 
rabbit antibody directed against the E. coli a subunit confirmed presence of the a 

30 subunit in the second peak (see the Western analysis in Fig. 14B). Antibody against 
the E. coli x subunit also confirmed the presence of the x subunit in the second peak. 
Some reaction against a and x was also present in the minor peak (first peak). The 
Coomassie Blue SDS polyacrylamide gel of the MonoQ fractions (Fig. 14A) showed 
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a band that co-migrated with E. coli a and was in the same postion as the antibody 
reactive material (antibody against E. coli a). Also present are bands corresponding 
to x, y, S, and 8\ These subunits, along with B, are all that is necessary for rapid and 
processive synthesis and primer extension over a long (> 7 kb) stretch of ssDNA in 
5 the case ofE. coli DNA Polymerase III holoenzyme. 

The Pol Ill-type enzyme purified from T.th. may be a Pol IIP -like 
enzyme that contains the DNA polymerase and clamp loader subuits (i.e., like the Pol 
III* of E. coli). The evidence for this is: 1) the presence of dnaX and dnaE gene 
products in the same column fractions as indicated by Western analysis (see above); 

1 0 2) the ability of this enzyme to extend a primer around a 7.2 kb circular ssDNA upon 
adding only B (see Example 16); 3) stimulation of Pol III by adding B on linear DNA, 
indicating B subunit is not present in saturating amounts (see Example 1 5); and 4) the 
presence of x in T. th. which may glue the polymerase and clamp loader into a Pol IIP 
as in E. coli; and 5) the comigration of a with subunits x, y, 8 and 8' of the clamp 

1 5 loader in the column fractions of the last chromatographic step (MonoQ, Fig. 1 4A). 



Micro-sequencing of T. th DNA Polymerase III a subunit 

The a subunit from the purified T.th DNA polymerase III 
(HEP.Pl.ATP-Bound.MONOQ peak2) was blotted onto PVDF membrane and was 
20 cut out of the SDS-PAGE gel and submitted to the Protein-Nucleic Acid Facility at 
Rockefeller University for N-terminal sequencing and proteolytic digestion, 
purification and microsequencing of the resultant peptides. Analysis of the a 
candidate band (Mw 130kD) yielded four peptides, two of which (TTH1, TTH2) 
showed sequence similarity to a subunits from various bacterial sources (see Fig. 1 5). 

25 

EXAMPLE 10 



Identification of the Thermus thermophilics dnaE gene encoding the a subunit of 
DNA polymerase III replication enzyme 
30 Cloning of the dnaE gene was started with the sequence of the TTH1 

peptide from the purified a subunit (FFIEIQNHGLSEQK) (SEQ. ID. No. 61). The 
fragment was aligned to a region at approximately 1 80 amino acids downstream of 
the N-termini of several other known a subunits as shown in Fig. 1 5. The upstream 
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33mer (5'-GTGGGATCCGTGGTTCTGGATCTCGATGAAGAA-3') (SEQ. ID. 
No. 3 1) consists of a BamHI site within the first 9 nucleotides (underlined) and the 
sequence coding for the following peptide HGLSEQK on the complementary strand. 
The downstream 29mer (5'-GTGGGATCCACGGSCTSTCSGAGCAGAAG-3') 
5 (SEQ. ID. No. 32) consists of a BamHI site within the first 9 nucleotides (underlined) 
and the following sequence coding for the peptide FFIEIQNH (SEQ. ID. No. 62). 

These two primers were directed away from each other for the purpose 
of perfoming inverse PCR (also called circular PCR). The amplification reactions 
contained lOng T.th. genomic DNA (that had been cut and religated with Xmal), 0.5 
1 0 mM of each primer, in a volume of 1 00 ul of Vent polymerase reaction mixture 
containing 10 ul ThermoPol Buffer, 0.5 mM of each dNTP and 0.25 mM MgS0 4 . 
Amplification was performed using the following cycling scheme: 

1 . 4 cycles of: 95.5°C - 30 sec, 45°C - 30 sec, 75°C - 8 min. 

2. 6 cycles of: 95.5°C - 30 sec, 50°C - 30 sec, 75°C - 6 min. 
15 3.30 cycles of: 95.5°C - 30 sec, 52.5°C - 30 sec, 75°C - 5 min. 

A 1.4kb fragment was obtained and cloned into pBS-SK:BamHI (i.e. pBS-SK 
(Stratragene) was cut with BamHI). This sequence was bracketted by the 29mer 
primer on both sides and contained the sequence coding for the N-terminal part of the 
subunit up to the peptide used for primer design. 
20 To obtain further dnaE gene sequence, the TTH2 peptide was used. It 

was aligned to a region about 600 amino acids from the N-termini of the other known 
subunits (Fig. 15B). 

The upstream 34mer 

(5'-GCGGGATCCTCAACGAGGACCTCTCCATCTTCAA-3') (SEQ. ID. No. 33) 
25 consists of a BamHI site within the first 9 nucleotides (underlined) and the sequence 
from the end of the fragment previously obtained. The downstream 35mer 
(5'-GCGGGATCCTTGTCGTCSAGSGTSAGSGCGTCGTA-3') (SEQ. ID. No. 34) 
consists of a BamHI site within the first 9 nucleotides (underlined) and the following 
sequence coding for the peptide YDALTLDD (SEQ. ID. No. 63) on the 
30 complementary strand. The amplification reactions contained 10 ng T.th. genomic 
DNA, 0.5 mM of each primer, in a volume of 100 ul of Vent polymerase reaction 
mixture containing 10 ul ThermoPol Buffer, 0.5 mM of each dNTP and 0.25 mM 
MgS04. Amplification was performed using the following cycling scheme: 
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1. 4 cycles of: 95.5°C - 30 sec, 45°C - 30 sec, 75°C - 8 min. 

2. 6 cycles of: 95.5°C - 30 sec, 50°C - 30 sec, 75°C - 6 min. 

3. 30 cycles of: 95.5°C - 30 sec, 55°C - 30 sec, 75°C - 5 min. 

A 1.2kb PCR fragment was obtained and cloned into pUC19:BamHI. The fragment 
5 was bracketted by the downstream primer on both sides and contained the region 
overlapping in 56 bp with the fragment previously cloned. 

To obtain yet more dnaE sequence, the following primers were used. 
The upstream 39mer 

(S'-GTOTOGATCCTCGTCCCCCTCATGCGCGACCAGGAAGGG-S') (SEQ. ID. 

10 Nos. 35 and 1 14) consists of a BamHI site within the first 10 nucleotides (underlined) 
and the sequence from the end of the fragment previously obtained. The downstream 
27mer C5'-GTGT GGATCC TTCTTCTTSCCCATSGC-3') (SEQ. ID. No. 36) consists 
of a BamHI site within the first 10 nucleotides (underlined), and the sequence coding 
for the peptide AMGKKK (SEQ. ID. No. 64) (at position approximately 800 residues 

1 5 from the N terminus) on the complementary strand. The AMGKKK (SEQ. ID. 

No. 64) sequence was chosen for primer design as it is highly conserved among the 
known gram-negative a subunits. The amplification reactions contained 10 ng T.th. 
genomic DNA, 0.5 mM of each primer, in a volume of 100 ul of Taq polymerase 
reaction mixture containing 10 ul PCR Buffer, 0.5 mM of each dNTP and 2.5 mM 

20 MgCi2- Amplification was performed using the following cycling scheme: 

1. 3 cycles of: 95.5°C - 30 sec, 45°C - 30 sec, 72°C - 8 min. 

2. 6 cycles of: 94.5°C - 30 sec, 55°C - 30 sec, 72°C - 6 min. 

3. 32 cycles of: 94.5°C - 30 sec, 50°C - 30 sec, 72°C - 5 min. 

A 2.3kb PCR fragment was obtained instead of the expected 0.6 kb fragment. BamHI 
25 digestion of the PCR product resulted in three fragments of 1 . 1 kb, 0.7kb and 0.5kb. 
The 1 . 1 kb fragment was cloned into pUC19:BamHI. It turned out to be the one 
adjacent to the fragment previously obtained and contained the dnaE sequence right 
up to the region coding for the AMGKKK (SEQ. ID. No. 64) peptide, but was 
disrupted by an intron just upstream of this region. The sequence that follows this 
30 was amplified from the 2.3kb original PCR product using the same conditions and 

cycling scheme as for the 2.3kb fragment. The downstream primer was the same as in 
the previous step. The upstream 27mer 

(3'-GTGTGGATCCGTGGTGACCTTAGCCAC-5') (SEQ. ID. Nos. 37 and 115) 
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consisted of a BamHI site within the first 9 nucleotides (underlined) and the sequence 
from the end of the 1.1 kb fragment previously described. 

The expected 1.2kb PCR fragment was obtained and cloned into 
pUC19:SmaI. This fragment coded for the rest of the intein and the end of it was used 
to obtain the next sequence of dnaE downstream of this region. The upstream 30mer 
(3'-TTCGTGTCCGAGGACCTTGTGGTCCACAAC-5') (SEQ. ID. Nos. 38 and 116) 
was a sequence from the end of the intron. The downstream 23mer 
(5'-CCAGAATCGTCTGCTGGTCGTAG-3 r ) (SEQ. ID. No. 39) was the sequence 
from the end of the dnaE gene of D.rad. (coding on the complementary strand for the 
region slightly homologous in the distantly related a subunits and possibly highly 
homologous between T.th. and D.rad. a subunits). The amplification reactions 
contained 10 ng T.th. genomic DNA, 0.5 mM of each primer, in a volume of 100 ul of 
Vent polymerase reaction mixture containing 10 ul ThermoPol Buffer, 0.5 mM of 
each dNTP and 0.1 mM MgS0 4 . Amplification was performed using the following 
cycling scheme: 

1 . 3 cycles of: 95.5°C - 30 sec, 55°C - 30 sec, 75°C - 8 min. 

2. 32 cycles of: 94.5°C - 30 sec, 50°C - 30 sec, 75°C - 5 min. 

A 2.5kb PCR fragment was obtained and cloned into pUC19:SmaI. This fragment 
contained the dnaE sequence coding for the 300 mino acids next to the AMGKKK 
(SEQ. ID. No. 64) region disrupted by yet a second intein inside another sequence 
that is conserved among the known a subunits (FNKSHSAAY) (SEQ. ID. No. 65). 

To obtain the rest of the dnaE gene the upstream 19mer 
(5'-AGCACCCTGGAGGAGCTTC-3') (SEQ. ID. No. 40) from the end of the known 
dnaE sequence was used. The downstream primer was: 

5'-CATGTCGTACTGGGTGTAC-3' (SEQ. ID. No. 41). The amplification reactions 
contained 10 ng T.th. genomic DNA, 0.5 mM of each primer, in a volume of 100 ul of 
Vent polymerase reaction mixture containing 1 0 ul ThermoPol Buffer, 0.5 mM of 
each dNTP and 0.1 mM MgS0 4 . Amplification was performed using the following 
cycling scheme: 

1. 3 cycles of: 95.5°C - 30 sec, 55°C - 30 sec, 75°C - 8 min. 

2. 32 cycles of: 94.5°C - 30 sec, 50°C - 30 sec, 75°C - 5 min. 

A 1 .Okb fragment bracketed by this upstream primer was obtained. It contained the 3' 
end of the dnaE gene. 
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EXAMPLE 1 1 

Cloning and Expression of the Thermus thermoyhilus dnaQ gene encoding the s 
5 subunit of DNA polymerase III replication enzyme 

Cloning of dnaQ 

The dnaQ gene of E. coli and the corresponding region of PolC of B. 

subtilis, evolutionary divergent organisms, share approximately 30% identity. 
10 Comparison of the predicted amino acid sequences for DnaQ (e) of E. coli and PolC 

of B. subtilis revealed two highly conserved regions (Fig. 17). Within each of these 

regions, a nine amino acid sequence was used to design two oligonucleotide primers 

for use in the polymerase chain reaction. 

The regions highly conservative among Pol III exonucleases were 
15 chosen to design the degenerate primers for the amplification of a T.th. dnaQ internal 

fragment (see Fig. 17). DNA oligonucleotides for amplification of T.th. genomic 

DNA were as follows. The upstream 27mer 

(5'-GTSGTSNNSGACNNSGAGACSACSGGG-3' (SEQ. ID. No. 42)) encodes the 
following sequence (VVXDXETTG) (SEQ. ID. No. 66). The downstream 27mer 

20 (5'-GAASCCSNNGTCGAASNNGGCGTTGTG-3') (SEQ. ID. No. 43) encodes the 
sequence HNAXFDXGF (SEQ. ID. No. 67) on the complementary strand. The 
amplification reactions contained 10 ng T.th. genomic DNA, 0.5 mM of each primer, 
in a volume of 100 jj.1 of Vent polymerase reaction mixture containing 10 \A 
ThermoPol Buffer, 0.5 mM of each dNTP and 0.5 mM MgS0 4 . Amplification was 

25 performed using the following cycling scheme: 

1 . 5 cycles of: 95.5°C - 30 sec, 40°C - 30 sec, 72°C - 2 min. 

2. 5 cycles of: 95.5°C - 30 sec, 45°C - 30 sec, 72°C - 2 min. 

3. 30 cycles of: 95.5°C - 30 sec, 50°C - 30 sec, 72°C - 30 min. 
Products were visualized in a 1.5 % native agarose gel. A fragment of the expected 

30 size of 270 bp was cloned into the Smal site of pUC 1 9 and sequenced with the 

CircumVent Thermal Cycle DNA sequencing kit accordinig to the manufacturer's 
instructions (New England Biolabs). 
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To obtain further sequence of the dnaQ gene, genomic DNA was 
digested with either mhol, BamHI, Kpnl or Ncol. These restriction enzymes were 
chosen because they cut T.th. genomic DNA frequently. Approximately 0.1 ug of 
DNA for each digest was ligated by T4 DNA ligase in 50 ul of ligation buffer (50 
5 mM Tris-HCl (pH 7.8), 10 mM MgCl 2 , 10 mM dithiothreitol, 1 mM ATP, 25 mg/ml 
bovine serum albumin) overnight at 20°C. The ligation mixtures were used for 
cicular PCR. 

DNA oligonucleotides for amplification of T.th. genomic DNA were 
the following. The upstream 27mer 

10 (5-CGGGGATCCACCTCAATCACCTCGTGG-3') (SEQ. ID. No. 44) consists of a 
BamHI site within the first 9 nucleotides (underlined) and the sequence 
complementary to 42-61 bp region of the previously cloned dnaQ fragment. The 
downstream 30mer (5'-CGG GGATCC GCCACCTTGCGGCTCCGGGTG-3') (SEQ. 
ID. No. 45) consists of a BamHI site within the first 9 nucleotides (underlined) and 

15 the sequence corresponding to 240-261 bp region of the dnaQ fragment (see Fig. 17). 

The amplification reactions contained 1 ng T. th. genomic DNA (that 
had been cut with Ncol and religated into circular DNA for circular PCR), 0.4 mM of 
each primer, in a volume of 100 ul of Vent polymerase reaction mixture containing 
10 ul ThermoPol Buffer, 0.5 mM of each dNTP, 0.5 mM MgS0 4 , and 10% DMSO. 

20 Circular amplification was performed using the following cycling scheme: 

1. 5 cycles of: 95.5°C - 30 sec, 50°C - 30 sec, 72°C - 8 min. 

2. 35 cycles of: 95.5°C - 30 sec, 55°C - 30 sec, 72°C - 6 min. 

3. 72°C- 10 min. 

A 1.5 kb fragment was obtained and cloned into the BamHI site of the pUC19 vector. 

25 Partial sequencing of the fragment reveiled that it contained the dnaQ regions 

adjacent to sequences corresponding to the PCR primers and hence contained the 
sequences both upstream and downstream of the previously cloned dnaQ fragment. 
One of Ncol sites turned out to be approximatly 300 bp downstream of the end of the 
first cloned dnaQ sequence and hence did not include the 3' end of dnaQ. To obtain 

30 the 3' end, another inverse PCR reaction was performed. Since an Apal restiction site 
was recognized within this newly sequenced dnaQ fragment, the circular PCR 
procedure was performed using as template an Apal digest of T.th. genomic DNA that 
was ligated (circularized) under the same conditions as described above. 
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DNA oligonucleotides for amplification of the Apal/religated T.th. 
genomic DNA were as follows. The upstream 3 lmer 

(5'-GCGCTCTAGACGAGTTCCCAAAGCGTGCGGT-3') (SEQ. ID. No. 46) 
consists of a mbal site within the first 10 nucleotides (underlined) and the sequence 
5 complementary to the region downstream of the Apal restriction site in the newly 
sequenced dnaQ fragment. The downstream 25 mer 

f5'-CGCG TCTAGA TCACCTGTATCCAGA-3') (SEQ. ID. No. 47) consists of a 
Xbal site within the first 10 nucleotides (underlined) and the sequence corresponding 
to another region downstream of the Apal restriction site in the newly sequenced 

1 0 dnaQ fragment. The 1 .7 kb PCR fragment was cloned into the Xbal site of the 
pUC19 vector and partially sequenced. The sequence of dnaQ, and the protein 
sequence of the s subunit encoded by it, is shown in Fig. 18. 

The dnaQ gene is encoded by an open reading frame of 209 (or 190 
depending on which Val is used as the initiating residue) amino acids in length 

1 5 (23598.5 kDa - or 21 383 .8 kDa for shorter version), similar to the length of the E. coli 
z subunit (243 amino acids, 27099.1 kDa mass) (see Fig. 17). 

The entire amino acid sequence of the s subunit predicted from the 
T. th. dnaQ gene aligns with the predicted amino acid sequence of the dnaQ genes of 
other organisms with only a few gaps and insertions (the first two amino acids, and 

20 four positions downstream) (Fig. 17). The consensus motifs VVXDXETTG (SEQ. 
ID. Nos. 66 and 68), HNAXFDXGF (SEQ. ID. No. 67), and HRALYD (SEQ. ID. 
No. 70), characteristic for exonucleases, are conserved. Overall, the level of amino 
acid identity relative to most of the known s subunits, or corresponding proofreading 
exonuclease domains of gram positive PolC genes is approximately 30%. Upstream 

25 of start 1 (Fig. 17) there were stop codons in all three reading frames. 

Expression of dnaQ 

The dnaQ gene was cloned gene into the pET24-a expression vector in 
two steps. First, the PCR fragment encoding the N-terminal part of the gene was 
30 cloned into the pUC19 plasmid, containing the Apal inverse PCR fragment into 

Ndel/Apal sites. DNA oligonucleotides for amplification of T.th. genomic DNA were 
as follows. The upstream 3 3 mer 

(5'-GCGGCGCATATGGTGGTGGTCCTGGACCTGGAG-3') (SEQ. ID. No. 48) 
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consists of an Ndel site within the first 12 nucleotides (underlined) and the begining 
of the dnaQ gene. The downstream 25 mer 

(5'-CGCGTCTAGATCACCTGTATCCAGA-3') (SEQ. ID. No. 49), already used for 
Apal circular PCR, consists of an Xbal site within the first 10 nucleotides 
5 (underlined) and the sequence corresponding to the region downstream of the Apal 
restriction site. The 2.2 kb Ndel/Sall fragment was then cloned into the Ndel/Xhol 
sites of the pET16 vector to produce pET24-a:dnaQ. The s subunit was expressed in 
the BL21/LysS strain transformed by the pET24-a:dnaQ plasmid. 

10 EXAMPLE 12 

The Thermus thermophilics dnaN gene encoding the fi subunit of DN A polymerase III 
replication enzyme 

15 Strategy of cloning dnaN hy use of dnaA 

DnaN proteins are highly divergent in bacteria making it difficult to 
clone them by homology. The level of identity between DnaN representatives from 
E. coli and B. subtilis is as low as 1 8%. These 1 8% of identical amino acid residues 
are dispersed through the proteins rather then clustering together in conservative 

20 regions, further complicating use of homology to design PCR primers. However, one 
feature of dnaN genes among widely different bacteria is their location in the 
chromosome. They appear to be near the origin, and immediately adjacent to the 
dnaA gene. The dnaA genes show good homology among different bacteria and, thus, 
dnaA was first cloned in order to obtain a DNA probe that is likely near dnaN. 

25 

Identification of dnaA and dnaN 

The dnaA genes of E. coli and B. subtilis share 58% identity at the 
amino acid sequence level within the ATP-binding domain (or among the 
representatives of gram-positive and gram-negative bacteria, evolutionary divergent 
30 organisms). Comparison of the predicted amino acid sequences encoded by dnaA of 
E. coli and B. subtilis revealed two highly conserved regions (Fig. 19). Within each 
of these regions, a seven amino acid sequence was used to design two oligonucleotide 
primers for use in the polymerase chain reaction. The DNA oligonucleotides for 
amplification of T.th. genomic DNA were as follows. The upstream 20mer 
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(5'-GTSCTSGTSAAGACSCACTT-3') (SEQ. ID. No. 50) encodes the following 
sequence: VLVKTHL (SEQ. ID. No. 69). The downstream 21mer 
(5'-SAGSAGSGCGTTGAASGTGTG-3\ where S is G or C) (SEQ. ID. No. 51) 
encodes the sequence: HTFNALL (SEQ. ID. No. 71), on the complementary strand. 
5 The amplification reactions contained 10 ng T.th. genomic DNA, 0.5 mM of each 
primer, in a volume of 100 ul of Vent polymerase reaction mixture containing 10 ul 
ThermoPol Buffer, 0.5 mM of each dNTP and 0.5 mM MgS0 4 . Amplification was 
performed using the following cycling scheme: 

1. 5 cycles of: 95.5°C - 30 sec, 45°C - 30 sec, 75°C - 2 min. 
10 2. 5 cycles of: 95.5°C - 30 sec, 50°C - 30 sec, 75°C - 2 min. 

3. 30 cycles of: 95.5°C - 30 sec, 52°C - 30 sec, 75°C - 30 min. 
Products were visualized in a 1.5% native agarose gel. A fragment of the expected 
size of 300 bp was cloned into the Smal site of pUC19 and sequenced with the 
CircumVent Thermal Cycle DNA sequencing kit (New England Biolabs). 
15 To obtain a larger section of the T.th. dnaA gene, genomic DNA was 

digested with either Haell, Hindffl, KasI, Kpnl, MM, Ncol, NgoMI, Nhel, Nsil, 
PaeR7I, PstI, Sad, Sail, Spel, SphI, StuI, or Xhol, followed by Southern analysis in a 
native agarose gel. The filter was probed with the 300 bp PCR product radiolabeled 
by random priming. Four different restriction digests showed a single fragment of 
20 reasonable size for further cloning. These were, KasI, NgoMI, and StuI, all of which 
produced fragments of about 3 kb, and Ncol that produced a 2kb fragment. Also, a 
Kpnl digest resulted in two fragments of about 1.5 kb and 10 kb. 

Genomic DNA digests using either NgoMI and StuI were used to 
obtain the dnaA gene by inverse PCR (also referred to as circular PCR). In this 
25 procedure, 0. 1 \ig of DNA from each digest was treated separately with T4 DNA 
ligase in 50 ul of ligation buffer (50 mM Tris-HCl (pH 7.8), 10 mM MgCl 2 , 10 mM 
dithiothreitol, 1 mM ATP, 25 mg/ml bovine serum albumin) overnight at 20°C. This 
results in circularizing the genomic DNA fragments. The ligation mixtures were used 
as substrate in inverse PCR. 
30 DNA oligonucleotides for amplification of recircularized T.th. 

genomic DNA were as follows. The upstream 22mer was 

(5 '-CTC GTTGGTGAAAGTTTCCGTG-3 ') (SEQ. ID. No. 52), and the downstream 
24mer was (5'-CGTCCAGTTCATCGCCGGAAAGGA-3') (SEQ. ID. No. 53). The 
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amplification reactions contained 5 ng T.th genomic DNA, 0.5 uM of each primer, in 
a volume of 100 ul of Taq polymerase reaction mixture containing 10 ul PCR Buffer, 
0.5 mM of each dNTP and 2.5 mM MgCl 2 . Amplification was performed using the 
following cycling scheme: 

5 1.5 cycles of: 95.0°C - 30 sec, 55°C - 30 sec, 72°C - 1 0 min. 

2. 35 cycles of: 95.5°C - 30 sec, 50°C - 30 sec, 72°C - 8 min. 
The PCR fragments of the expected length for NgoMI and StuI treated and then 
ligated chromosomal DNA were digested with either BamHI or Sau3a and cloned into 
pUC19:BamHI and pUC19:(BamHI+SmaI) and sequenced with CircumVent Thermal 

10 Cycle DNA sequencing kit. The 1 .6kb (BamHI+BamH) fragment from the NgoMI 
PCR product contained a sequence coding for the N-terminal part of dnaN, followed 
by the gene for enolase. The lkb (Sau3a+Sau3a) fragment from the same PCR 
product included the start of dnaN gene and sequence characteristic of the origin of 
replication (i.e., 9mer DnaA-binding site sequences). The 0.6kb (BamHI+BamHI) 

1 5 fragment from the StuI PCR reaction contained starts for dnaA and gidA genes in 

inverse orientation to each other. The 0.4 kb (Sau3a+Sau3a) fragment from the same 
PCR product contained the 3' end of the dnaA gene and DNA sequence characteristic 
for the origin of replication. 

This sequence information provided the beginning and end of both the 

20 dnaA and the dnaN genes. Hence, these genes were easily cloned from this 

information. Further, the dnaN gene was readily cloned and expressed in a pET24-a 
vector. These steps are described below. 

Cloning and sequence of the dnaA gene 
25 The dnaA gene was cloned for sequencing in two parts: from the 

potential start of the gene up to its middle and from the middle up to the end. For the 
N-terminal part, the upstream 27mer 

(5'-TCTGGCAACACGTTCTGGAGCACATCC-3') (SEQ. ID. No. 54) was 20 bp 
downsteam of the potential start codon of the gene. The downstream 23mer 
30 (5 '-TGCTGGCGTTC ATCTTCAGGATG-3 ') (SEQ. ID. No. 55) was approximately 
from the middle of the dnaA gene. For the C-terminal part, the upstream 23mer 
(5'-CATCCTGAAGATGAACGCCAGCA-3') (SEQ. ID. No. 56) was complementary 
to the previous primer. The downstream 25mer 
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(5'-AGGTTATCCACAGGGGTCATGTGCA-3 ? ) (SEQ. ID. No. 57) was 20 bp 
upstream the potential stop codon for the dnaA gene. The amplification reactions 
contained 10 ng T.th. genomic DNA, 0.5 uM of each primer, in a volume of 100 ul of 
Vent polymerase reaction mixture containing 10 ul ThermoPol Buffer, 0.5 mM of 
5 each dNTP and 0.5 mM MgS0 4 . Amplification was performed using the following 
cycling scheme: 

1. 5 cycles of: 95.5°C - 30 sec, 55°C - 30 sec, 75°C - 3 min. 

2. 30 cycles of: 95.5°C - 30 sec, 50°C - 30 sec, 75°C - 2 min. 
Products were visualized in a 1 .0% native agarose gel. Fragments of the expected 

10 sizes of 750 bp and 650 bp were produced, and were sequenced using CircumVent 

Thermal Cycle DNA sequencing method (New England Biolabs). The nucleotide and 
amino acid sequences of dnaA and its protein product are shown in Fig. 20. The 
DnaA protein is homologous to the DnaA proteins of several other bacteria as shown 
in Fig. 19. 

15 

Cloning and expression of dnaN 

The full length dnaN gene was obtained by PCR from T.th. total DNA. 
DNA oligonucleotides for amplification of T.th. dnaN were the following: the 
upstream 29mer (5'-GTGTGTCATATGAACATAACGGTTCCCAA-3') (SEQ. ID. 

20 No. 58) consists of an Ndel site within first 1 1 nucleotides (underlined), followed by 
the sequence for the start of the dnaN gene; the downstream 29mer 
(5 ' -GC GC G A ATTC TC C CTTGTGG A AGGCTT AG-3 ') (SEQ. ID. No. 59) consists of 
an EcoRI site within the first 10 nucleotides (underlined), followed by the sequence 
complementary to a section just downstream of the dnaN stop codon. The 

25 amplification reactions contained 10 ng T.th. genomic DNA, 0.5 uM of each primer, 
in a volume of 100 ul of Vent polymerase reaction mixture containing 10 ul 
Thermopol Buffer, 0.5 mM of each dNTP and 0.2 mM MgS0 4 . Amplification was 
performed using the following cycling scheme: 

1. 5 cycles of: 95.0°C - 30 sec, 55°C - 30 sec, 75°C - 5 min. 

30 2. 35 cycles of: 95.5°C - 30 sec, 50°C - 30 sec, 75°C - 4 min. 

The nucleotide and amino acid sequences of dnaN and the B subunit, respectively, are 
shown in Fig. 21 . The T.th. B subunit shows limited homology to the B subunit 
sequences of several other bacteria over its entire length (Fig. 22). 
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The approximately 1 kb dnaN gene was cloned into the pET24-a 
expression vector using the Ndel and EcoRI restriction sites both in the dnaN 
containing PCR product and in pEt24-a (Fig. 23). Expression of T.th. B subunit was 
obtained under the following conditions: a fresh colony of B121(DE3) E.coli strain 
5 was transformed by the pET24-a:dnaN plasmid, and then was grown in LB broth 

containing 50 mg/ml kanamycin at 37°C until the cell density reached 0.4 OD 60 o. The 
cell culture was then induced for dnaN expression upon addition of 2 mM IPTG. 
Cells were harvested after 4 additional hours of growth under 37°C. The induction of 
the T.th. B subunit is shown in Fig. 24. 
1 0 Two liters of BL2 1 (DE3)pET£faaiVcells were grown in LB media 

containing 50 mg/ml ampicillin at 37°C to an O.D. of 0.8 and then IPTG was added to 
a concentration of 2 mM. After a further 2 h at 37°C, cells were harvested by 
centrifugation and stored at -70°C. The following steps were performed at 4°C. Cells 
were thawed and resuspended in 40 ml of 5 mM Tris-HCl (pH 8.0), 1% sucrose, 1M 
1 5 NaCl, 5 mM DTT, and 30 mM spermidine. Cells were lysed using a French Pressure 
cell at 20,000 psi. The lysate was allowed to sit at 4°C for 30 min. and then cell 
debris was removed by centrifugation (Sorvall SS-34 rotor, 45 min. 18,000 rpm). The 
supernatant was incubated at 65°C for 20 minutes with occasional stirring. The 
resulting protein precipitate was removed by centrifugation as described above. The 
20 supernatant was dialyzed against 4 liters of buffer A containing 50 mM NaCl 

overnight. The dialyzed supernatant was clarified by centrifugation (35 ml, 150 mg 
total) and then loaded onto an 8 ml MonoQ column equilibrated in buffer A 
containing 50 mM NaCl. The column was washed with 5 column volumes of the 
same buffer and then eluted with a 120 ml gradient of buffer A plus 50 mM NaCl to 
25 buffer A plus 500 mM NaCl. Fractions of 2 ml were collected. Over 50 mg of T.th. B 
was recovered in fractions 5-21. 
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Identification and cloning of T. thermophilus holA 

A search of the incomplete T.th. genome database (www.g21.bio.uni- 
goettingen.de) showed a match to E. coli 8 encoded by holA. The sequence obtained 
from the database was as follows (SEQ. ID. No. 185): 

TPKGKDLVRHLENRAKRLGLRLPGGVAQYLA-SLEGDLEALERELEKLALLSP 
-PLTLEKVEKVVALRPPLTGFDLVRSVLEKDPKEALLRLGRLKEEGEEPLRLL 
GAL S WQF ALL ARAFFLLREMPRPKEEDL ARLE AHP YAAKKALL-EAARRLTE 
EALKEALDALMEAEKRAKG-GKDPWLALEAAVLRLAR-PAGQPRVD 

Next, the following PCR primers were designed from the codon usage 
of T.th. : upstream 27mer (5'- GCC CAG TAC CTC GCC TCC CTC GAG GGG -3') 
(SEQ. ID. No. 186) and downstream 27mer (5'- GGC CCC CTT GGC CTT CTC 
GGC CTC CAT -3' (SEQ. ID. No. 187) to obtain a partial holA nucleotide sequence 
(SEQ. ID. No. 188): 

AGACTCGAGG CCCTGGAGCG GGAGCTGGAG AAGCTTGCCC TCCTCTCCCC ACCCCTCACC 
CTGGAGAAGG TGGAGAAGGT GGTGGCCCTG AGGCCCCCCC TCACGGGCTT TGACCTGGTG 
CGCTCCGTCC TGGAGAAGGA CCCCAAGGAG GCCCTCCTGC GCCTCAGGCG CCTCAGGGAG 
GAGGGGGAGG AGCCCCTCAG GCTCCTCGGG GCCCTCTCCT GGCAGTTCGC CCTCCTCGCC 
CGGGCCTTCT TCCTCCTCCG GGAAAACCCC AGGCCCAAGG AGGAGGACCT CGCCCGCCTC 
GAGGCCCACC CCTACGCCGC CAAGAAGGCC A 

This sequence codes for a partial amino acid sequence of the T.th. 5 
subunit (SEQ. ID. No. 189): 

RLEALERELEKLALLSPPLTLEKVEKVVALRPPLTGFDLVRSVLEKDPKEALL 
RLRRLREEGEEPLRLLGALSWQFALLARAFFLLRENPRPKEEDLARLEAHPYA 
AKKA 

The DNA sequence obtained by PCR (SEQ. ID. No. 1 88) was used to 
design internal primers for inverted PCR. The upstream 31mer (5'- 
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GTGGTGTCTAGACATCATAACGGTTCTGGCA-3') (SEQ. ID. NO. 190) 
introduced an Xbal site for cloning holA into a pGEX vector. The downstream 27mer 
(5'-GAGGGCCACCACCTTCTCCACCTTCTC-3') (SEQ. ID. No. 191) encodes holA 
sequence EKVEKVVAL (aa residues 159-167 of SEQ. ID. No. 158) on the 
complementary strand. The amplification reactions contained 50ng T.th. genomic 
DNA and 0.1 uM of each primer in a volume of lOOul of Vent polymerase reaction 
mixture containing 10 ul ThermoPol Buffer, 2.5 mM of each dNTP, 2 mM MgS0 4 , 
and 1 0 ul of formamide. Amplification was performed using the following cycling 
scheme: 

1. 5 cycles of: 95°C - 30 sec, 65°C - 20 sec, 75°C - 5 min. 

2. 5 cycles of: 95°C - 20 sec, 58°C - 10 sec, 75°C - 5 min. 

3. 35 cycles of: 95°C - 20 sec, 50°C - 5 sec, 75°C - 4 min. 
Products were visualized in a 1.0% native agarose gel. A fragment of 1.5 Kb was gel 
purified and partially sequenced. 

A different set of primers were used to obtain the 3' -end of T.th. hoi A, 
including an upstream 25mer (5'-CTCCGTCCTGGAGAAGGACCCCAAG-3') 
(SEQ. ID. No. 192) which encoded the amino acid sequence SVLEKDPK from T.th. 
holA (aa residues 179-186 of SEQ. ID. No. 158), and a downstream 29mer (5 r - 
CGCGAATTCAACGCSCTCCTCAAGACSCT-3' where S = C or G) (SEQ. ID. No. 
193) was not related to the holA sequence. The amplification reactions contained 
50ng T.th. genomic DNA and 0.1 uM of each primer in a volume of 100 ul of Vent 
polymerase reaction mixture containing 10 ul ThermoPol Buffer, 2.5 mM of each 
dNTP, and 1-2 mM MgS0 4 , and 10 ul of formamide. Amplification was performed 
using the following cycling scheme: 

1. 5 cycles of: 95°C - 30 sec, 65°C - 20 sec, 75°C - 5 min. 

2. 5 cycles of: 95°C - 20 sec, 55°C - 10 sec, 75°C - 5 min. 
3.35 cycles of: 95°C - 20 sec, 50°C - 5 sec, 75°C - 4 min. 

Products were visualized in a 1 .0% native agarose gel. A fragment of 1.2 Kb was gel 
purified and partially sequenced to obtain the remainder of the T. th. holA gene. 

The T.th. holA gene was cloned into the Ndel/EcoRI sites in the pET24 
vector using a pair of primers. The upstream 3 lmer (5'- 

GACACTTAA CATATG GTCATCGCCTTCACCG-3') (SEQ. ID. No. 194) contains 
a Ndel site within the first 1 5 nucleotides (underlined) and has a sequence 
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corresponding to 5' region of T.th. holA. The downstream 38 mer (5- 
GTGTGT GAATTC GGGTCAACGGGCGAGGCGGAGGACCG-3') (SEQ. ID. 
No. 195) contains a EcoRI site within the first 12 nucleotides (underlined) and has a 
sequence complementary to the 3' end of holA gene. 



EXAMPLE 14 

Identification of T.th. holB encoding 5' subunit 

To clone the ends of T.th. holB gene, it was assumed that the order of 
genes in Thermus thermophilis could be the same as in related Deinococcus 
radiodurance. Multiple alignment of the upstream neighbor (probable 
phosphoesterase, DNA repair Rad24c related protein) revealed a conservative region 
close to the C-terminus of the protein sequence: 



Deinococcus radiodurance VILNPGSVGQ 
Methanococcus janaschii YLINPGSVGQ 
Thermotoga maritima LVLNPGSAGR 



(SEQ. ID. No. 196) 
(SEQ. ID. No. 197) 
(SEQ. ID. No. 198) 



The D.rad. sequence was used to design an upstream 28mer primer (5'- 
CTGGTGAACCCGGGCTCCGTGGGCCAGC-3') (SEQ. ID. No. 199) that encodes 
the amino acid sequence LLVNPGSVGQ (SEQ. ID. No. 200) and a downstream 
27mer (5 '-CTCGAGGAGCTTGAGGAGGGTGTTGGC-3 ') (SEQ. ID. No. 201) 
encodes the sequence ANTLLKLLE (SEQ. ID. No. 202) on the complementary 
strand. The amplification reactions contained 50ng T.th. genomic DNA and 0.1 uM 
of each primer in a volume of lOOu.1 of Deep Vent polymerase reaction mixture 
containing 10 ul ThermoPol Buffer, 2.5 mM of each dNTP, 1.5 mM MgS0 4 , and 
lOul formamide. Amplification was performed using the following cycling scheme: 

1. 5 cycles of: 95°C - 30 sec, 68°C - 20 sec, 75°C - 3 min. 

2. 5 cycles of: 95°C - 20 sec, 63°C - 20 sec, 75°C - 3 min. 

3. 35 cycles of: 95°C - 20 sec, 55°C - 10 sec, 75°C - 3 min. 
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Product was visualized in a 1.0% native agarose gel as a single band of 0.7 Kb. The 
fragment was purified and partially sequenced. 

Multiple alignment of the gene downstream of D.rad. identified the 
following conservative region: 

Deinococcus radiodurans GFGGVQLHAAHGYLLSQFLSPRHNVREDEYGG 

(SEQ. ID. No. 203) 

Caenorhabditis elegans GFDGIQLHGAHGYLLSQFTSPTTNKRVDKYGG 

(SEQ. ID. No. 204) 

Pseudomonas aeruginosa GFSGVEIHAAHGYLLSQFLSPLSNRRSDAWGG 

(SEQ. ID. No. 205) 

Archaeoglobus fulgidus GFDAVQLHAAHGYLLSEFISPHVNRRKDEYGG 

(SEQ. ID. No. 206) 

The fragment in bold was used to design primers, specifically the 
downstream primer, for cloning of the 3' region of the T.th. holB gene. The upstream 
30mer (5'-CATCCTGGACTCGGCCCACCTCCTCACCGA-3') (SEQ. ID. No. 207) 
encodes the amino acid sequence ILDSAHLLT (SEQ. ID. No. 208). The downstream 
33mer (5'- GAGGAGGTAGCCGTGGGCCGCGTGGAGCTCCAC-3') (SEQ. ID. 
No. 209) encodes the sequence VELHAAHGYLL (SEQ. ID. No. 210) on the 
complementary strand. The amplification reactions contained 50ng T.th. genomic 
DNA and 0.1 uM of each primer in a volume of 100u.l of Deep Vent polymerase 
reaction mixture containing 10 ul ThermoPol Buffer, 2.5 mM of each dNTP, 2 mM 
MgS0 4 , and 10 ul DMSO. Amplification was performed using the following cycling 
scheme: 

1. 5 cycles of: 95°C - 30 sec, 70°C - 20 sec, 75°C - 4 min. 

2. 5 cycles of: 95°C - 20 sec, 66°C - 20 sec, 75°C - 4 min. 

3. 30 cycles of: 95°C - 20 sec, 60°C - 10 sec, 77°C - 4 min. 
Products were visualized in a 1.0% native agarose gel as a single band of 1 .1 kb. The 
Kb fragment was gel purified and sequenced to provide the remainder of the holB 
gene encoding T.th. 8'. 
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For protein expression, the T.th. holB gene was cloned into the pET24 
vector at the Nde:EcoR sites using a pair of primers. The upstream 32mer (5'- 
GGCTTTCC CATATG GCTCTACACCCGGCTCAC-3') (SEQ. ID. No. 211) contains 
a Ndel site within the first 15 nucleotides (underlined) and the sequence 
5 corresponding to the 5' region of T.th. holB. The downstream 29 mer (5'- 

GCGT GGATCC ACGGTCATGTCTCTAAGTC-3') (SEQ. ID. No. 212) contains a 
BamHI site within the first 10 nucleotides (underlined) and a sequence 
complementary to the 3' end of the holB gene. 

10 

EXAMPLE 15 

Alternate synthetic path in absence of clamp loader activity 

As discussed earlier, the Pol Ill-type enzyme of the present invention is 

1 5 capable of application and use in a variety of contexts, including a method wherein 
the clamp loader component that is traditionally involved in the initiation of enzyme 
activity, is not required. The clamp loader generally functions to increase the 
efficiency of ring assembly onto circular primed DNA, because both the ring and the 
DNA are circles and one must be broken transiently for them to become interlocked 

20 rings. In such a reaction, the clamp loader increases the efficiency of opening the 
ring. 

The procedure described below illustrates the instance where the clamp 
loader need not be present. For example, the B clamp can be assembled onto DNA in 
the absence of the clamp loader. Particularly, the bulk of primed templates in PCR 

25 reactions are linear ssDNA fragments that are primed at the ends. On linear primed 
DNA, the ring need not open at all. Instead, the ring can simply thread onto the end 
of the linear primed template (Bauer and Burgers, 1988; Tan et al, 1986; O'Day et al., 
1992; Burgers and Yoder, 1993). Hence, on linear primed templates, such as those 
generated in PCR, the beta clamp can simply slide over the DNA end. After the ring 

30 slides onto the end, the DNA polymerase can associate with the ring for enhanced 
DNA synthesis. 

Such "end assembly" is common among Pol Ill-type enzymes and has 
been demonstrated in yeast and human systems. Rings assembling onto linear DNA 
for use by their respective DNA polymerases are shown in the following example 
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demonstrated in the E. coli bacterial system, in the human system, and in the T. th. 
system. 

The bulk of the primed templates in PCR reactions are linear ssDN A 
fragments that are primed at their ends. However, these end primed linear fragments 

5 are not generated until after the first step of PCR has already been performed. In the 
very first step, PCR primers generally anneal at internal sites in a heat denatured 
ssDNA template. Primed linear templates are then generated in subsequent steps 
enabling use of this alternate path. For this first step, the clamp may be assembled 
onto an internal site in the absence of the clamp loader using special conditions that 

10 allow clamp assembly in the absence of a clamp loader. 

For example, a set of conditions that lead to assembly of the clamp 
onto circular DNA (i.e., internal primed sites) have been described in the protocol for 
the use of the bacteriophage T4 ring shaped clamp (gene 45 protein) without the 
clamp loader (Reddy et al., 1993). In this case, polyethylene glycol leads to 

1 5 "macromolecular crowding" such that the clamp and DNA are pushed together in 
close proximity, leading to the ring self assembling onto internal primed sites on 
circular DNA. Other possible conditions that may lead to assembly of rings onto 
internal sites include use of a high concentration of beta such that use of heat or 
denaturant to break the dimeric ring into two half rings (crescents) followed by 

20 lowering the heat (or dilution or removal of denaturant) leading to rings assembling 
around the DNA. 

The ring shaped sliding clamps of E. coli and human slide over the end 
of linear DNA to activate their respective DNA polymerase in the absence of the 
clamp loader. This clamp loader independent assay is performed in the bacterial 

25 system in Fig. 25 A. For this assay, the linear template is poly d A primed with 

oligodT. The polydA is of average length 4500 nucleotides and was purchased from 
SuperTecs. 01igodT35 was synthesized by Oligos etc. The template was prepared 
using 145ul of 5.2 mM (as nucleotide) polydA and 22 ul of 1.75 mM (as nucleotide) 
oligodT. The mixture was incubated in a final volume of 2100 ul T.E. buffer (ratio as 

30 nucleotide was 21:1 polydA to oligodT). The mixture was heated to boiling in a 1 ml 
Eppendorf tube, then removed and allowed to cool to room temperature. Assays were 
performed in a final volume of 25 ul 20 mM Tris-Cl (pH 7.5), 8 mM MgCl 2 , 5 mM 
DTT, 0.5 mM EDTA, 40 mg/ml BSA, 4% glycerol, containing 20 uM [a- 32 P]dTTP, 
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0.1 \xg polydA-oligodT, 25 ng Pol III and, where present, 5 pg of B subunit. Proteins 
were added to the reaction on ice, then shifted to 37°C for 5 min. DNA synthesis was 
quantitated using DE81 paper as described (Rowen and Kornberg, 1978). 

In the linear template assay, no ATP or dATP is provided and 
5 therefore, a clamp loader, even if present, is not active. Thus, the clamp (e.g., B) can 
only stimulate the DNA polymerase provided the clamp threads onto the DNA (see 
diagram in Fig. 25). Hence, threading of the clamp is shown by a stimulation of the 
DNA polymerase. In lane 1 of Fig. 25 A, the DNA polymerase is incubated with the 
the linear DNA in the absence of the clamp, and lane 2 shows the result of adding the 

10 clamp. The results show that the clamp is able to thread onto the DNA ends and 
stimulate the DNA polymerase in the absence of ATP and thus, in the absence of 
clamp loading as well. 

This clamp loader independent assay is performed in the human system 
in Fig. 25B. The assay reaction (25ul) contains 50 mM Tris-HCl (pH=7.8), 8 mM 

15 MgC12, 1 mM DTT, 1 mM creatine phosphate, 40 ng/ml bovine serum albumin, 0.55 
jag human SSB, 100 ng PCNA (where present), 7 units DNA polymerase delta (1 unit 
incorporates 1 pmol dTMP in 60 min.), 40 mM [a- 32 P]dTTP and 0.1 ug 
polydA-oligodT. Proteins were added to the reaction on ice, then shifted to 37°C for 
60 min. DNA synthesis was quantitated using DE81 paper as described (Rowen and 

20 Kornberg, 1978). In lane 3, (Fig. 25) the DNA polymerase 5 is incubated with the 
linear DNA in the absence of the clamp, and lane 4 showes the result of adding the 
PCNA clamp. The results demonstrate that the clamp is able to thread onto the DNA 
ends and stimulate the DNA polymerase in the absence of ATP and thus, the absence 
of clamp loading. 

25 This clamp loader independent assay is performed in the T.th. system 

in Fig. 25C. The assay reaction is exactly as described above for use of the E. coli Pol 
III and beta system except the temperature is 60°C and here the Pol III is HEP. PI 
T.th. Pol III (0.5 ul, providing 0.1 units where one unit is equal to 1 pmol of dTTP 
incorporated in 1 minute under these conditions and in the absence of beta), and the 

30 beta subunit is 7 |ig T.th. B (from the MonoQ column). Proteins were added to the 
reaction on ice, then shifted to 37°C for 60 min. DNA synthesis was quantitated 
using DE81 paper as described (Rowen and Kornberg, 1978). In lane 3 (Fig. 25C), 
the T. Th. Pol III is incubated with the linear DNA in the absence of the clamp, and 
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lane 4 shows the result of adding the T.th. B clamp. The results demonstrate that the 
clamp is able to thread onto the DNA ends and stimulate the DNA polymerase in the 
absence of clamp loader activity. 

5 EXAMPLE 16 

Use of T.th. Pol III in long chain primer extension 

A characteristic of Pol Ill-type enzymes is their ability to extend a 
single primer for several kilobases around a long (e.g. 7 kb) circular single stranded 

10 DNA genome of a bacteriophage. This reaction uses the circular B clamp protein. 

For the circular B to be assembled onto a circular DNA genome, the circular B must be 
opened, positioned around the DNA, and then closed. This assembly of the circular 
beta around DNA requires the action of the clamp loader, which uses ATP to open 
and close the ring around DNA. In this example, the 7.2 kb circular single strand 

1 5 DNA genome of bacteriophage Ml 3mpl 8 was used as a template. This template was 
primed with a single DNA 57mer oligonucleotide and the Pol III enzyme was tested 
for conversion of this template to a double strand circular form (RFH). The reaction 
was supplemented with recombinant T.th. B produced in E. coli. This assay is 
summarized in the scheme at the top of Fig. 26. M13mpl8 ssDNA was phenol 

20 extracted from phage purified as described (Turner and O'Donnell, 1995). M13mpl8 
ssDNA was primed with a 57mer DNA oligomer synthesized by Oligos etc. The 
replication assays contained 73 ng singly primed M13mpl8 ssDNA and 100 ng T.th. 
B subunit in a 25 ul reaction containing 20 mM Tris-HCl (pH 7.5), 8 mM MgCl 2 , 40 
ug/ml BSA, 0.1 mM EDTA, 4% glycerol, 0.5 mM ATP, 60 uM each of dCTP, dGTP, 

25 dATP and 20 uM cc- 32 P-TTP (specific activity 2,000-4,000 cpm/pmol). Either T.th. 
Pol III from the Heparin, peak 1 (HEP .PI; 5 ul, 0.21 units where 1 unit equals 1 pmol 
nucleotide incorporated in 1 min.) or a non-Pol III from the Heparin peak 2 (HEP.P2; 
5 ul, 2.6 units) were added to the reaction. Reactions were shifted to 60° C for 5 
min., and then DNA synthesis was quenched upon adding 25 ul of 1% SDS, 40 mM 

30 EDTA. One half of the reaction was analyzed in a 0.8% native agarose gel, and the 
other half was quantitated using DE81 paper as described (Studwell and O'Donnell, 
1990). 
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The results of the assay are shown in Fig. 26. Lane 1 is the result 
obtained using the T.th. Pol III (HEP .PI) which was capable of extending the primer 
around the ssDNA circle to form RFII. Lane 2 shows the result of using the non-Pol 
III (HEP.P2) which was not capable of this extension and produced only incomplete 
DNA products (the result shown included 0.8 \x% E. coli SSB which did not increase 
the chain length of the product). In the absence of SSB, the same product was 
observed, although the band contained more counts. The greater amount of total 
synthesis observed in lane 2 is due to the build up of immature products in a small 
region of the gel. The presence of immature products in lane 1 is likely due to a 
contaminating polymerase in the preparation that can not convert the single primer to 
the full length RFII form. Alternatively, the presence of incomplete products in lane 1 
(Pol III type enzyme) is due to secondary structure in the DNA which causes the Pol 
III to pause. In this case it may be presumed that performing the reaction at higher 
temperature could remove the secondary structure barrier. Alternatively, SSB could 
be added to the assay (although T.th. SSB would be needed, because addition of E. 
coli SSB was tried and did not alter the quality of the product profile). Generally, SSB 
is needed to remove secondary structure elements from ssDNA at 37°C for complete 
extension of primers by mesophilic Pol Ill-type enzymes. 

The assay described above was performed at 60°C. The T.th. Pol III 
HEP.P1 gained activity as the temperature was increased from 37°C to 60°C, as 
expected for an enzyme from a thermophilic source. The E. coli Pol III lost activity at 
60°C compared to 37°C, as expected for an enzyme from a mesophilic source. 

EXAMPLE 17 

Materials used in Examples 18-29 

Radioactive nucleotide were from Dupont NEN; unlabeled nucleotides 
were from Pharmacia Upjohn. DNA oligonucleotides were synthesized by Gibco 
BRL. M13mpl8 ssDNA was purified from phage that was isolated by two successive 
bandings in cesium chloride gradients. M13mpl8 ssDNA was primed with a 30-mer 
(map position 6817-6846) as described. The pET protein expression vectors and 
BL21 (DE3) protein expression strain of E. coli were purchased from Novagen. DNA 
modification enzymes were from New England Biolabs. Aquifex aeolicus genomic 
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DNA was a gift of Dr. Robert Huber and Dr. Karl Stetter (Regensburg University, 
Germany). Protein concentrations were determined by absorbance at 280nm using 
extention coefficients calculated from their known Tip and Tyr content using the 
equation 8 28 o=Trp m (5690 M" 1 cm" 1 )+ Tyr n (1280 M" 1 cm" 1 ). 

5 

EXAMPLE 18 

Purification of a Encoded by dnaE 

The Aquifex aeolicus dnaE gene was previously identified (Deckert et 

10 al., 1998). The dnaE was obtained by searching the Aquifex aeolicus genome with 
the amino acid sequence of T.th a subunit (encoded by dnaE). The dnaE gene was 
amplified from Aquifex aeolicus genomic DNA by PCR using the following primers: 
the upstream 37mer (5 ' -GTGTGT C ATATG AGT A AG 
GATTTCGTCCACCTTCACC-3 ') (SEQ. ID. No. 157) contains an Ndel site 

15 (underlined); the downstream 34mer (5'- 

GTGTGTGGATCCGGGGACTACTCGGAAGTAAGGG-3') (SEQ. ID. No. 158) 
contains a BamHI site (underlined). The PCR product was digested with Ndel and 
BamHI, purifed, and ligated into the pET24 Ndel and BamHI sites to produce 
pETAadnaE. 

20 The pETAadnaE plasmid was transformed into the BL21 (DE3) strain 

of E. coli. Cells were grown in 50L of LB containing lOOng/ml of kanamycin, 5mM 
MgS0 4 at 37°C to OD 600 = 2.0, induced with 2mM IPTG for 20h at 20°C, then 
collected by centrifugation. Cells were resuspended in 400ml 50mM Tris-HCl (pH 
7.5), 10% sucrose, 1M NaCl, 30mM spermidine, 5mM DTT and 2mM EDTA. The 

25 following procedures were performed at 4°C. Cells were lysed by passing them twice 
through a French Press (15,000 psi) followed by centrifugation at 13,000 rpm for 90 
min at 4°C. In this protein preparation, as well as each of those that follow, the 
induced Aquifex aeolicus protein was easily discernible as a large band in an SDS 
polyacrylamide gel stained with Coomassie Blue. Hence, column fractions were 

30 assayed for the presence of the Aquifex aeolicus protein by SDS PAGE analysis, 
which forms the basis for pooling column fractions. 

The clarified cell lysate was heated to 65°C for 30 min and the 
precipitate was removed by centrifugation at 13,000 rpm in a GSA rotor for lh. The 
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supernatant (1.4gm, 280ml) was dialyzed against buffer A (20mM Tris-HCl (pH 
7.5)), 10% glycerol, 0.5 mM EDTA, 5mM DTT) overnight, then diluted to 320ml 
with buffer A to a conductivity equal to lOOmM NaCl. The dialysate was applied to a 
1 50ml Fast Flow Q (FFQ) Sepharose column (Pharmacia) equilibrated in buffer A, 
and eluted with a 1 .5L linear gradient of 0-500mM NaCl in buffer A. Eighty fractions 
were collected. Fractions 38-58 (lg, 390ml) were pooled, dialyzed versus buffer A 
overnight, and applied to a 250ml Heparin Agarose column (Bio-Rad) equilibrated 
with buffer A. Protein was eluted with a 1L linear 0-5mM NaCl gradient in buffer A. 
One hundred fractions were collected. Fractions 69-79 (320 mg in 200 ml) were 
pooled and dialyzed against buffer A containing 100 mM NaCl. The a preparation 
was aliquoted and stored frozen at -80°C (see Fig. 27). 

EXAMPLE 19 

Purification of 5 Encoded by holA 

The Aquifex aeolicus holA gene was not previously identified by the 
genome sequencing group at Diversa (Deckert et al., 1998). Aquifex aeolicus holA 
was identified by searching the Aquifex aeolicus genome with the amino acid 
sequence of the T.th 8 subunit (encoded by holA). The Aquifex aeolicus holA was 
amplified by PCR using the following primers: the upstream 36mer (5'- 
GTGTGTCATATGGAAACCACAATATTCCAGTTCCAG-3') (SEQ. ID. No. 159) 
contains anNdel site (underlined); the downstream 39mer (5'- 
GTGTGTGGATCCTTATCC ACC ATG AG AAGT ATTTTTC AC-3 ' ) (SEQ. ID. No. 
160) contains a BamHI site (underlined). The PCR product was digested with Ndel 
and BamHI, purified, and ligated into the pET24 Ndel and BamHI sites to produce 
pETAaholA. 

The pETAaholA plasmid was transformed into E. coli strain BL21 
(DE3). Cells were grown in 50L of LB media containing 100ng/ml kanamycin. Cells 
were grown at 37°C to OD 6 oo = 2.0, induced for 20h upon addition of 2mM IPTG, 
then collected by centrifugation. Cells from 25L of culture were lysed as described in 
Example 18. 

The cell lysate was heated to 65 °C for 30 min and the precipatate was 
removed by centrifugation. The supernatant (650mg, 240ml) was dialyzed against 
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buffer A, adjusted to a conductivity equal to 160mM NaCl by addition of 40ml of 
buffer A, and applied to a 220ml Heparin Agarose column equilibrated in buffer A 
containing 1 OOmM NaCl. The column was eluted with 1 .0L linear gradient of 1 50- 
700 mM NaCl in buffer A. One hundred and four fractions were collected. Fractions 
45-56 were pooled (250mg, 210 ml), diluted with 230ml buffer A to a conductivity 
equal to 230mM NaCl, then loaded onto a 100ml FFQ Sepharose column equilbrated 
in buffer A containing 150mM NaCl. The column was eluted with 200ml linear 
gradient of 150-750mM NaCl in buffer A; seventy-three fractions were collected. 
Fractions 16-38 were pooled (95mg, 40ml), aliquoted, and stored at -80°C (see Fig. 
27). 

EXAMPLE 20 

Purification of 5' Encoded by holB 

The Aquifex aeolicus holB gene was previously identified by the 
genome sequencing facility at Diversa (Deckert et al., 1998). The Aquifex aeolicus 
holB sequence was obtained by searching the Aquifex aeolicus genome with the 
sequence of the T.th. 5' (encoded by holB). The Aquifex aeolicus holB gene was 
amplified by PCR using the following primers: the upstream 39mer (5'- 
GTGTGT CATATG GAAAAAGTTTTTTTTGGAAA AAACTCCAG-3 ') (SEQ. ID. 
No. 161) contains an Ndel site (underlined); the downstream 35mer (5'- 
GTGTGTGGATCCTTAATCCGCCTGAACGGCTAACG-3') (SEQ. ID. No. 162) 
contains a BamHI site (underlined). The PCR product was digested with Ndel and 
BamHI, purified, and ligated into the pET24 Ndel and BamHI site to produce 
pETAaholB. 

The pETAaholB plasmid was transformed into E. coli strain BL21 
(DE3). Cells were grown at 37°C in 50L media containing lOOug/ml kanamycin to 
OD 6 oo 2.0, then induced for 3h upon addition of 0.2mM IPTG. Cells were collected 
by centrifugation and were lysed using lysozyme by the heat lysis procedure (Wickner 
and Kornberg, 1974). The cell lystate was heated to 65°C for 30 min and precipatate 
was removed by centrifugation. The supernatant (2.4g, 400ml) was dialyzed versus 
buffer A, then applied to a 220ml FFQ Sepharose column equilibrated in buffer A. 
Protein was eluted with a 1L linear gradient of 0-500mM NaCl in buffer A; eighty 
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fractions were collected. Fractions 23-30 were pooled and diluted 2-fold with buffer 
A to a conductivity equal to lOOmM NaCl, then loaded onto a 200ml Heparin 
Agarose column equilibrated in buffer A. Protein was eluted with a 1L linear gradient 
of 0-1 .0M NaCl in bufferA; eighty-four fractions were collected. Fractions 46-66 
were pooled (1.3g, 395ml), dialyzed versus buffer A containing lOOmMNaCl, then 
aliquoted and stored frozen at -80°C (see Fig. 27) 

EXAMPLE 21 

Purification of x Encoded by dnaX 

The Aquifex aeolicus dnaX gene was previously identified (Deckert et 
al. 5 1998). The dnaX gene sequence was obtained by searching the Aquifex aeolicus 
genome with the sequence of T.th. x subunit (encoded by dnaX). The Aquifex 
aeolicus dnaX was amplified by PCR using the following primers: the upstream 
41mer (5'-GTGTGTCATATGAACTACGTTCCCTTCGCGAGAAAGTACAG-3') 
(SEQ. ID. No. 163) contains anNdel site (underlined); the downstream 36mer (5'- 
GTGTGTGGATCCTT AAAAC AGC CTCGTCCCGCTGGA-3 ' ) (SEQ. ID. No. 164) 
contains a BamHI site (underlined). The PCR product was digested with Ndel and 
BamHI, purified, and ligated into the pET24 Ndel and BamHI sites to produce 
pETAadnaX. 

The pETAadnaX plasmid was transformed into E. coli strain BL21 
(DE3). Cells were grown in 50L LB containing 100 ug/ml kanamycin at 37°C to 
OD 6 oo= 0.6, then induced for 20h at 20°C upon addition of IPTG to 0.2mM. Cells 
were collected by centrifugation and lysed as described in Example 18. The clarified 
cell lysate was heated to 65°C for 30 min and the protein precipitate was removed by 
centrifugation. The supernatant (l.lg in 340ml) was treated with 0.228g/ml 
ammonium sulfate followed by centrifugation. The x subunit remained in the pellet 
which was dissolved in buffer B (20mM Hepes (pH 7.5), 0.5mM EDTA, 2mM DTT, 
1 0% glycerol) and dialyzed versus buffer B to a conductivity equal to 87mM NaCl. 
The dialysate (1073mg, 570ml) was applied to a 200ml FFQ Sepharose column 
equilibrated in buffer A. The column was eluted with a 1 .5L linear gradient of 0- 
500mM NaCl in buffer A; eighty fractions were collected. Fractions 28-37 were 
pooled (289mg, 138ml), dialyzed against buffer A to a conductivity equal to 82mM 
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NaCl, then loaded onto a 150ml column of Heparin Agarose equilibrated in buffer A. 
The column was eluted with a 900ml linear gradient of 0-500mM NaCl in buffer A; 
thirty-two fractions were collected. Fractions 15-18 (187mg, 1 10ml) were dialyzed 
versus buffer A, then aliquoted and stored at -80°C (see Fig. 27). 

EXAMPLE 22 

Purification of (3 Encoded by dnaN 

The Aquifex aeolicus dnaN gene was previously identified (Deckert et 
al., 1998). The dnaN sequence was obtained by searching the Aquifex aeolicus 
genome with the sequence of T.th. (3 subunit (encoded by dnaN). The Aquifex 
aeolicus dnaN gene was amplified by PCR using the following primers: the upstream 
33mer (5 ' -GTGTGTCATATG_CGCGTTAAGGTGG AC AGGGAG-3 ' ) (SEQ. ID. 
No. 165) contains an Ndel site (underlined); the downstream 36mer (5'- 
TGTGT CTCGAG TCATGGCTACACCCTC ATCGGCAT-3 ') (SEQ. ID. No. 166) 
contains a Xhol site (underlined). The PCR product was digested with Ndel and 
BamHI, purified, and ligated into the pET24 Ndel and BamHI sites to produce 
pETAadnaN. 

The pETAadnaN plasmid was transformed into E. coli strain BL21 
(DE3). Cells were grown in 1L LB containing lOOmg/ml kanamycin at 37°C to 
OD 6 oo = 1 -0, then induced for 6h upon addition of 2mM IPTG. Cells were collected 
(7g) and lysed as described in Example 1 8. The cell lysate was heated to 65°C for 30 
min and the protein precipitate was removed by centrifugation. The supernatant 
(39mg, 45ml) was applied to a 10ml DEAE Sephacel column (Pharmacia) 
equilibrated in buffer A. The column was eluted with a 1 00ml linear gradient of 0- 
500mM NaCl in bufferA; seventy-five fractions were collected. Fractions 45-57 were 
pooled (18.7mg), dialyzed versus buffer A, and applied to a 30ml Heparin Agarose 
column equilibrated in buffer A. The column was eluted with a 300ml linear gradient 
of 0-500mM NaCl in buffer A; sixty-five fractions were collected. Fractions 27-33 
were pooled (1 Img, 28ml) and stored at -80°C (see Fig. 27). 
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EXAMPLE 23 

Purification of SSB Encoded by ssb 

The Aquifex aeolicus ssb gene was previously identified (Deckert et 
al., 1998g). The ssb gene sequence was obtained by searching the Aquifex aeolicus 
genome with the sequence of T.th. SSB (encoded by ssb). The Aquifex aeolicus ssb 
gene was amplified by PCR using the following primers: the upstream 47mer (5'- 
GTGTGT CATATG CTCAA TAAGGTTTTTATAATAGGAAGACTTACGGG-3 ') 
(SEQ. ID. No. 167) contains anNdel site (underlined); the downstream 39mer (5'- 
GTGT GGATCC TTA AAAAGGTATTTCGTCCTCTTCATCGG-3 ') (SEQ. ID. No. 
1 68) contains a BamHI site (underlined). The PCR product was digested with Ndel 
and BamHI, purified, and ligated into the pET16 Ndel and BamHI sites to produce 
pETAassb. 

The pETAassb plasmid was transformed into E. coli strain BL21 
(DE3). Cells were grown in 6L of LB media containing 200ng/ml ampicillin. Cells 
were grown at 37°C to OD 6 oo= 0.6, then induced at 15°C overnight in the presence of 
2mM IPTG and collected by centrifugation. Cells were lysed as described above in 
Example 18, except cells were resuspended in buffer C (20mM Tris-HCl (pH 7.9), 
500mM NaCl). 

The cell lysate was heated to 65°C for 30 min, then the precipitate was 
removed by centrifugation. The supernatant (1.4g, 190ml) was applied to 25ml 
Chelating Sepharose column (Pharmacia-Biotech) charged with 50mM Nickel Sulfate 
and then equilibrated in buffer C containing 5mM Imidazole. The column was eluted 
with a 300ml linear gradient of 5-100mM Imidazole in buffer C. Fractions of 4ml 
were collected. Fractions 81-92 were pooled (~240mg in 48ml) and dialyzed 
overnight against 2L of buffer B containing 200mM NaCl. The dialysate was diluted 
to a conductivity equal to 92mM NaCl using buffer A and then loaded onto an 8ml 
MonoQ column equilibrated in buffer A containing lOOmM NaCl. The column was 
eluted with a 120ml linear gradient of 100-500mM Imidazole in buffer A. Seventy- 
four fractions were collected. Fractions 57-70 were pooled (lOOmg, 25ml), aliquoted, 
and stored at -80°C (see Fig. 27). 
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EXAMPLE 24 

MonoQ Preparation of x88' 

The 8 subunit (0.29mg) purified in Example 19 and 8' subunit 
(0.3 lmg) purified in Example 20 were mixed in a volume of 2.8ml of buffer A at 
15°C. After 30min, the x subunit (0.5mg in 1 .4ml), purified in Example 21, was 
added and the reaction was incubated a further lh at 1 5°C. The reaction was applied 
to a lml MonoQ column equilibrated in buffer A. The x68' complex elutes later than 
either x, 8 or 5' alone. Protein was eluted with a 32ml linear gradient of 100-500mM 
NaCl in buffer A; eighty fractions were collected. Analysis of the MonoQ fractions in 
a SDS polyacylamide gel shows a peak of x88' complex that elutes in fractions of 
32-38 (see Fig. 28). The peak fractions 850u.g were stored at -80°C. This procedure 
can easily be scaled up. For example, a much larger amount of x55' was constituted 
by following a similar protocol and using a 8ml MonoQ column, which yielded 9.6mg 
of xSS'. 

EXAMPLE 25 

Constitution of ax8S' Complex 

The reaction mixture contained 1.2 mg asubunit (9nmol; 133,207 da) 
purified in Example 18, 0.41mg x subunit (7.5 nmol; 54,332 da) purified in 
Example 21, 0.41 mg 8 subunit (10 nmol; 40,693 da) purified in Example 19, and 0.2 
mg 8' subunit (9nmol; 29,000 da) purified in Example 20 in 1 .lml buffer A. The a 
and x subunit solutions were premixed in 871)0.1 for 2h at 15°C before adding 8 and 8' 
subunit solution, then the complete mixture was allowed to incubate an additional 
12 h at 15°C. The reaction may not require an order of addition, or these extended 
incubation times. The reaction mixture was concentrated to 200 ul using a Centricon 
30 at 4°C, then applied to an FPLC Superose 6 HR 10/30 column (25ml) at 4°C 
developed with a continuous flow of buffer A containing lOOmM NaCl. After the 
first 216 drops (6.6ml), fractions of 7 drops each were collected. Fractions were 
analyzed on a SDS polyacrylamide gel stained with Coomassie Blue (Fig. 29). The 
analysis was repeated using the a subunit alone (Fig. 29). The results show that the 
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peak fractions of a shift to a considerably earlier position when x, 8 and 5' are present 
and a comigrates with x, 5, and 5', when compared to the elution position of a alone, 
indicating that a assembles with x, 8 and 8' into a ax88' complex. 

EXAMPLE 26 

octSS' Functions with the 13 Clamp 

Replication reactions were performed using circular M13mpl8 ssDNA 
primed with a synthetic DNA 90 mer oligonucleotide. Reactions contained 8.6p.g 
primed M13mpl8 ssDNA, 9.4p.g SSB purified in Example 23, l.Opg ax88' prepared 
in Example 25, and 2.0jxg (3 subunit purified in Example 22 (when present), in 230ul 
of 20mM Tris-HCl (pH 7.5), 5mM DTT, 4% glycerol, 8mM MgCl 2 , 0.5mM ATP, 
60uM each dATP and dGTP (buffer composition is for a final volume of 250ul). 
Reactions were mixed on ice, then aliquoted into separate tubes containing 25 pi each. 
For each timed reaction, the mixture was brought to 65 °C for 2 min before initiating 
syntheses upon addition of 2ul of dCTP and a 32 P-dTTP (final centrations, 60 and 
40uM, respectively). Aliquots were quenched at the times indicated in Fig. 30 upon 
adding 4ul of 0.25M EDTA, 1% SDS. Quenched reactions were then analyzed in a 
0.8% alkaline agarose gel. The results, illustrated in Fig. 30, demonstrate that 
efficient synthesis requires addition of the p subunit. Comparison with size standards 
in the same gel indicates an average speed of -125 nucleotides; the leading edge of 
the product smear indicates a maximum speed of 375 nucleotides/s. 

EXAMPLE 27 

Purification of Tth. a subunit 

To obtain T.th. a subunit, 8 L of E. coli BL21(DE3) cells harboring 
pETtthalpha were grown to O.D. = 0.3 and induced upon adding IPTG. Cells were 
collected by centrifugation and resuspended in 200 ml 50mM Tris-HCl (pH 7.5), 10% 
sucrose, 1M NaCl, 30mM spermidine, 5mM DTT and 2mM EDTA. The following 
procedures were performed at 4°C. Cells were lysed by passing them three times 
through a French Press (20,000 psi) followed by incubation at 4°C for 30 min and 
then centrifugation at 1 8,000 rpm in an SS-34 rotor for 45 min at 4°C. Induced 
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protein was less that 1% total cell protien but was discernible as a band that migrated 
in the appropriate position for its predicted molecular weight in an SDS 
polyacrylamide gel stained with Coomassie Blue. Hence, column fractions were 
assayed for the presence of the protein by SDS PAGE analysis, which forms the basis 
for pooling column fractions. 

The clarified cell lysate was heated to 65°C for 30 min and the 
precipitate was removed by centrifugation. The supernatant (1.4gm, 280ml) was 
dialyzed against buffer A (20mM Tris-HCl (pH 7.5), 10% glycerol, 0.5 mM EDTA, 
5mM DTT) overnight, then diluted to 320ml with buffer A to a conductivity equal to 
lOOmM NaCl. The dialysate (approximately 150 mg) was applied to a 60ml DEAE 
Fast Flow Q (FFQ) Sepharose column (Pharmacia) equilibrated in buffer A, and 
eluted with a 600 ml linear gradient of 0-500mM NaCl in buffer A. Fractions of 8 ml 
each were collected. The Tth. a subunit could be seen as a major band in several 
fractions, especially in fractions 26-30. In these peak fractions the Tth. a subunit was 
approximately 20-30 percent pure. 

EXAMPLE 28 

Purification of Tth. s subunit 

The dnaQ gene was cloned into the pET16 expression plasmid using 
the Val within the context "VGLWEW..." and transformed into E. coli (BL21(DE3). 
This pET plasmid places an N-terminal leader containing six histidines onto the 
expressed protein to facilitate purification via use of chelate affinity chromatography. 
Twelve liters of cells were grown to an OD of 0.7 and induced with IPTG. Induced 
cells were collected by centrifugation and resuspended in 150 ml of buffer C (20mM 
Tris-HCl (pH 7.9), 500mM NaCl). Cells were lysed by passing them two times 
through a French Press (20,000 psi) followed by incubation at 4°C for 30 min and 
then centrifugation at 13,800 rpm in an SLA- 1500 rotor for 45 min at 4°C. Induced 
protein appeared greater than 5% total cell protien and was easily discernible as a 
band that migrated in the appropriate position for its predicted molecular weight in an 
SDS polyacrylamide gel stained with Coomassie Blue. Hence, column fractions were 
assayed for the presence of the protein by SDS PAGE analysis, which forms the basis 
for pooling column fractions. 
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Upon analyzing the precipitate from the cell lysis, and the supernatent, 
it was determined that the epsilon subunit was insoluble and appeared in the 
precipitate. Therefore the cell pellet was resuspended in 100 ml of binding buffer 
containing 6M freshly deionized urea. This resuspension was then placed in 
centrifuge bottles and spun at 13,800 rpm for 45 min in the SLA- 1500 rotor. The 
epsilon was in the supernatent and was applied to a 25 ml Chelating Sepharose 
column (Pharmacia-Biotech) charged with 50 mM Nickel Sulfate and then 
equilibrated in buffer C containing 5mM Imidazole. The column was washed with 
two column volumes of buffer C, then washed with 5 column volumes of beffer C 
containing 80 mM Imidazole (final). Then the Tth epsilon was eluted with a 250 ml 
linear gradient of 60-1000 mM Imidazole in buffer C. Fractions of 4ml were 
collected. Fractions 15-24 were pooled (-131 mg) and dialyzed overnight against 2L 
of buffer A containing 6M urea, but no NaCl or glycerol. The dialysate was then 
loaded onto an 8ml MonoQ column equilibrated in buffer A containing 6M urea. The 
column was eluted with a 120ml linear gradient of 0-500 mM NaCl in buffer A 
containing urea. Sixty five fractions were collected. The epsilon is approximately 
80-90 percent pure at this stage. Fractions 13-17 were stored at -80°C. The epsilon is 
in urea but is at a concentration of 5-10 mg/ml, and thus can be used with other 
proteins by diluting it such that the final urea concentration is less than 0.5 M. This 
level of urea does not generally denature protein, and should allow epsilon to renature 
for catalytic activity. 

EXAMPLE 29 

Temperature optimum of Aguifex and Thermus a subunit DNA polymerases 

The temperature optimum of the alpha subunits of the Aquifex and 
Thermus replicases was tested in the calf thymus DNA replication assay. In this 
experiment, the reactions were assembled on ice in 25 ul containing 2.5 ug calf 
thymus activated DNA, and either 0.88 ug Aquifex a, or 0.6 ug of the Thermus a 
DEAE pool of peak fractions (obtained from Examples 18 and 28, respectively) in 20 
mM Tris-HCl (pH 8.8), 8 mM MgCl 2 , 10 mM KC1, 10 mM (NH 4 )S0 4 , 2 mM 
MgS0 4 , 0.1% Triton X-100, 60 uM each dATP, dCTP, dGTP, and 20 uM a 32 P- 
dTTP. Reactons were shifted to either 30, 40, 50, 60, 70, 80, or 90°C, then stopped 
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after 5 minutes and spotted onto DE81 filters to quantitate DNA synthesis. The 
results, illustrated in Figs. 31-32, show that these enzymes increase in activity as the 
temperature is raised. The Thermus a has a broad peak of activity from 70-80°C 
(Fig. 31), while the Aquifex a is maximal at 80°C (Fig. 32). The Aquifex a retains 
considerable activity at 90°C, whereas the Thermus a is nearly inactive at 90°C, a 
result that is consistent with the higher temperature at which the Aquifex aeolicus may 
live relative to the Thermus bacterium. 

EXAMPLE 30 

Temperature optimum of Aquifex crcSS'/p 

Aquifex a, p, t88\ SSB and crcSS' were tested for stability at different 
temperatures by incubating the protein in a solution, followed by performing a 
replication assay of the protein. Incubation was performed in 0.4 ml tubes under 
mineral oil. The 5 ul reaction mixture contained: buffer B (20 mM Tris-HCl (pH 
7.5), 5 mM DTT, 5 mM EDTA), and either: 0.352 ug of a (Fig. 33 A), 0.2 ug of p 
(Fig. 33B), 0.125 ug x complex (Fig. 33C), 0.32 ug SSB and 0.042 ug primed 
M13mpl8 ssDNA (Fig. 33D), 0.82 fig Pol IIP (Fig. 33E). Reactions were incubated 
for 2 min. at either 70, 80, 85, or 90°C in the presence of either 0.1% Triton X-100 
(filled diamonds); 0.05% Tween-20 and 0.01% NP-40 (filled circles); 4 mM CaCl 2 
(filled triangles); 40% Glycerol (inverted filled triangles); 0.01% Triton X-100, 0.05% 
Tween-20, 0.01% NP-40, 4 mM CaCl 2 (half-filled square); 40% Glycerol, 0.1% 
Triton X-100 (open diamonds); 40% Glycerol, 0.05% Tween-20, 0.01% NP-40 (open 
circles); 40% Glycerol, 4 mM CaCl 2 (open triangles); 40% Glycerol, 0.01% Triton X- 
100, 0.05% Tween-20, 0.01% NP-40, 4 mM CaCl 2 (half-filled diamonds). After 
heating, reactions were shifted to ice and 20 ul of replication assay buffer was added 
followed by incubation for 1 .5 min at 70°C; 1 5 ul was then spotted onto a DE81 filter 
and DNA synthesis was quantitated. The replication assay buffer contained: 60 mM 
Tris-HCl (pH 9.1 at 25°C), 8mM MgCl 2 , 18 mM (NH 4 ) 2 S0 4 , 2 mM ATP, 60 uM each 
of dATP, dCTP, dGTP, and 20 uM [oc" 32 P] TTP (specific activity 10,000 cpm/pmol), 
and 0.264 ug primed M13mpl8 ssDNA. To assay for p, 0.1 ng 01x68' was added to 
the reaction. To assay tSS', 0.9 ng p and 0.17 ng a were added to the reaction. To 
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assay for SSB, 0.17 ng E. coli p and 0.1 ng E. coli ax58' were added to the reaction 
followed by incubation for 1 .5 min at 37°C. To assay for oct55', 0.9 ng (3 was added 
to the reaction. To assay a, the calf thymus DNA replication assay was performed in 
the buffer as described above but 2.5 jig activated calf thymus DNA was used instead 
5 of primed M 1 3mp 1 8 ssDNA, no other replication proteins were added, and incubation 
was for 8 min at 70°C. 
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WHAT IS CLAIMED: 

1 . An isolated DNA molecule from a thermophilic bacterium, the 
isolated DNA molecule encoding a DNA polymerase Ill-type enzyme subunit. 

2. The isolated DNA molecule according to claim 1 , wherein the 
enzyme subunit is selected from the group consisting of alpha, beta, tau, gamma, 
epsilon, delta, delta prime, and SSB subunits. 

3 . The isolated DNA molecule according to claim 2, wherein the 
enzyme subunit is a delta subunit. 

4. The isolated DNA molecule according to claim 3, wherein the 
thermophilic bacterium is Aquifex aeolicus. 

5. The isolated DNA molecule according to claim 4, wherein the 
delta subunit comprises an amino acid sequence of SEQ. ID. No. 124. 

6. The isolated DNA molecule according to claim 4, wherein the 
DNA molecule comprises a nucleotide sequence of SEQ. ID. No. 123 or hybridizes to 
a DNA molecule comprising the nucleotide sequence of SEQ. ID. No. 123 under 
stringent conditions. 

7. The isolated DNA molecule according to claim 3, wherein the 
thermophilic bacterium is Thermus thermophilus. 

8. The isolated DNA molecule according to claim 7, wherein the 
delta subunit comprises an amino acid sequence of SEQ. ID. No. 158. 

9. The isolated DNA molecule according to claim 7, wherein the 
DNA molecule comprises a nucleotide sequence of SEQ. ID. No. 157 or hybridizes to 
a DNA molecule comprising the nucleotide sequence of SEQ. ID. No. 157 under 
stringent conditions. 
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10. The isolated DNA molecule according to claim 3, wherein the 
thermophilic bacterium is Thermatoga maritima. 

1 1 . The isolated DNA molecule according to claim 1 0, wherein the 
delta subunit comprises an amino acid sequence of SEQ. ID. No. 146. 

12. The isolated DNA molecule according to claim 1 0, wherein the 
DNA molecule comprises a nucleotide sequence of SEQ. ID. No. 145 or hybridizes to 
a DNA molecule comprising the nucleotide sequence of SEQ. ID. No. 145 under 
stringent conditions. 

1 3 . The isolated Dna molecule according to claim 3, wherein the 
thermophilic bacterium is Bacillus stearothermophilus . 

14. The isolated DNA molecule according to claim 13, wherein the 
delta subunit comprises an amino acid sequence of SEQ. ID. No. 178. 

15. The isolated DNA molecule according to claim 13, wherein the 
DNA molecule comprises a nucleotide sequence of SEQ. ID. No. 177 or hybridizes to 
a DNA molecule comprising the nucleotide sequence of SEQ. ID. No. 177 under 
stringent conditions. 

16. The isolated DNA molecule according to claim 2, wherein the 
replication enzyme subunit is a delta prime subunit. 

17. The isolated DNA molecule according to claim 1 6, wherein the 
thermophilic bacterium is Aquifex aeolicus. 

1 8. The isolated DNA molecule according to claim 1 7, wherein the 
delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 126. 

19. The isolated DNA molecule according to claim 17, wherein the 
DNA molecule comprises a nucleotide sequence of SEQ. ID. No. 125 or hybridizes to 
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a DNA molecule comprising the nucleotide sequence of SEQ. ID. No. 125 under 
stringent conditions. 

20. The isolated DNA molecule according to claim 1 6, wherein the 
thermophilic bacterium is Thermus thermophilics. 

2 1 . The isolated DNA molecule according to claim 20, wherein the 
delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 156. 

22. The isolated DNA molecule according to claim 20, wherein the 
DNA molecule comprises a nucleotide sequence of SEQ. ID. No. 155 or hybridizes to 
a DNA molecule comprising the nucleotide sequence of SEQ. ID. No. 155 under 
stringent conditions. 

23. The isolated DNA molecule according to claim 16, wherein the 
thermophilic bacterium is Thermatoga maritima. 

24. The isolated DNA molecule according to claim 23, wherein the 
delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 146. 

25. The isolated DNA molecule according to claim 23, wherein the 
DNA molecule comprises a nucleotide sequence of SEQ. ID. No. 147 or hybridizes to 
a DNA molecule comprising the nucleotide sequence of SEQ. ID. No. 147 under 
stringent conditions. 

26. The isolated DNA molecule according to claim 16, wherein the 
thermophilic bacterium is Bacillus stear other mophilus. 

27. The isolated DNA molecule according to claim 26, wherein the 
delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 180. 

28. The isolated DNA molecule according to claim 26, wherein the 
DNA molecule comprises a nucleotide sequence of SEQ. ID. No. 179 or hybridizes to 
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a DNA molecule comprising the nucleotide sequence of SEQ. ID. No. 179 under 
stringent conditions. 

29. An isolated replication enzyme subunit of a thermophilic 
bacterium which is encoded by the isolated DNA molecule of claim 1 . 

30. The isolated replication enzyme subunit according to claim 29, 
wherein the replication enzyme subunit is selected from the group of consisting alpha, 
beta, tau, gamma, epsilon, delta, delta prime, and SSB subunits. 

3 1 . The isolated replication enzyme subunit according to claim 30, 
wherein the replication enzyme subunit is a delta subunit. 

32. The isolated replication enzyme subunit according to claim 3 1 , 
wherein the thermophilic bacterium is Aquifex aeolicus. 

33. The isolated replication enzyme subunit according to claim 32, 
wherein the delta subunit comprises an amino acid sequence of SEQ. ID. No. 124. 

34. The isolated replication enzyme subunit according to claim 3 1 , 
wherein the thermophilic bacterium is Thermus thermophilus . 

3 5 . The isolated replication enzyme subunit according to claim 34, 
wherein the delta subunit comprises an amino acid sequence of SEQ. ID. No. 158. 

36. The isolated replication enzyme subunit according to claim 3 1 , 
wherein the thermophilic bacterium is Thermotoga maritima. 

37. The isolated replication enzyme subunit according to claim 36, 
wherein the delta subunit comprises an amino acid sequence of SEQ. ID. No. 146. 

38. The isolated replication enzyme subunit according to claim 3 1 , 
wherein the thermophilic bacterium is Bacillus stearothermophilus. 
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39. The isolated replication enzyme subunits according to claim 38, 
wherein the delta subunit comprises an amino acide sequence of SEQ. ID. No. 178. 

40. The isolated replication enzyme subunit according to claim 30, 
wherein the replication enzyme subunit is a delta prime subunit. 

41 . The isolated replication enzyme subunit according to claim 40, 
wherein the thermophilic bacterium is Aquifex aeolicus. 

42. The isolated replication enzyme subunit according to claim 41, 
wherein the delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 
126. 

43. The isolated replication enzyme subunit according to claim 40, 
wherein the thermophilic bacterium is Thermus thermophilus . 

44. The isolated replication enzyme subunit according to claim 43, 
wherein the delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 
156. 

45. The isolated replication enzyme subunit according to claim 40, 
wherein the thermophilic bacterium is Thermotoga maritima. 

46. The isolated replication enzyme subunit according to claim 45, 
wherein the delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 
148. 

47. The isolated replication enzyme subunit according to claim 40, 
wherein the thermophilic bacterium is Bacillus stoarothermophilus. 



48. The isolated replication enzyme subunit according to claim 47, 
wherein the delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 
180. 
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49. An expression system comprising an expression vector into 
which is inserted a heterologous DNA molecule according to claim 1 . 

50. The expression system according to claim 40, wherein the 
heterologous DNA molecule is in sense orientation and correct reading frame. 

51. A host cell comprising a heterologous DNA molecule 
according to claim 1 . 

52. A method of producing a recombinant thermostable DNA 
polymerase Ill-type enzyme, or subunit thereof, from a thermophilic bacterium, said 
method comprising: 

transforming a host cell with at least one heterologous DNA molecule 
according to claim 1 under conditions suitable for expression of the DNA polymerase 
Ill-type enzyme, or subunit thereof, and 

isolating the DNA polymerase Ill-type enzyme, or subunit thereof. 

53. The method according to claim 52, wherein the enzyme subunit 
is selected from the group consisting of alpha, beta, tau, gamma, epsilon, delta, delta 
prime, and SSB subunits. 

54. The method according to claim 53, wherein the enzyme subunit 
is a delta or delta prime subunit. 

55. The method according to claim 54, wherein the thermophilic 
bacteria is Thermus thermophilus, Aquifex aeolicus, Thermotoga maritima, or 
Bacillus ste ar other mophilus . 

56. The method according to claim 52, wherein said transforming 
is carried out by transforming the host cell with a plurality of heterologous DNA 
molecules according to claim 1 under conditions suitable for expression of the DNA 
polymerase Ill-type enzyme, or a plurality of subunits thereof, and said isolating is 
carried out by isolating the DNA polymerase Ill-type enzyme, or the plurality of 
subunits thereof. 



- 128- 



57. An isolated clamp loader of a DNA polymerase Ill-type 
enzyme comprising either a heterologously expressed delta subunit, a heterologously 
expressed delta prime subunit, or both, derived from a thermophilic eubacteria. 

58. The isolated clamp loader according to claim 57, wherein the 
thermophilic bacteria is a Thermus species, a Thermotoga species, an Aquifex species, 
or a Bacillus species. 

59. The isolated clamp loader according to claim 58, wherein the 
thermophilic bacteria is Thermus thermophilus . 

60. The isolated clamp loader according to claim 59, wherein the 
delta subunit comprises an amino acid sequence of SEQ. ID. No. 158. 

61 . The isolated clamp loader according to claim 59, wherein the 
delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 156. 

62. The isolated clamp loader according to claim 58, wherein the 
thermophilic bacteria is Thermotoga maritima. 

63. The isolated clamp loader according to claim 62, wherein the 
delta subunit comprises an amino acid sequence of SEQ. ID. No. 146. 

64. The isolated clamp loader according to claim 62, wherein the 
delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 148. 

65. The isolated clamp loader according to claim 58, wherein the 
thermophilic bacteria is Aquifex aeolicus. 

66. The isolated clamp loader according to claim 65, wherein the 
delta subunit comprises an amino acid sequence of SEQ. ID. No. 124. 
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67. The isolated clamp loader according to claim 65, wherein the 
delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 126. 

68. The isolated clamp loader according to claim 58, wherein the 
thermophilic bacteria is Bacillus stearothermophilus. 

69. The isolated clamp loader according to claim 68, wherein the 
delta subunit comprises an amino acid sequence of SEQ. ID. No. 178. 



70. The isolated clamp loader according to claim 68, wherein the 
delta prime subunit comprises an amino acid sequence of SEQ. ID. No. 1 80. 
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ABSTRACT OF THE INVENTION 

The present invention relates to an isolated DNA molecule from a 
thermophilic bacterium which encodes a DNA polymerase Ill-type enzyme subunit. 
5 Also encompassed by the present invention are host cells and expression system 

including the heterologous DNA molecule of the present invention, as well as isolated 
replication enzyme subunits encoded by such DNA molecules. Also disclosed is a 
method of producing a recombinant thermostable DNA polymerase Ill-type enzyme, 
or subunit thereof, from a thermophilic bacterium, which is carried out by 
10 transforming a host cell with at least one heterologous DNA molecule of the present 

invention under conditions suitable for expression of the DNA polymerase Ill-type 
enzyme, or subunit thereof, and then isolating the DNA polymerase Ill-type enzyme, 
or subunit thereof. 
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ATGGGCCGGGAGCTCCGCTTCGCCCACCTCCACCAGCACA 

CCCAGTTCTCCCTCCTGGACGGGGCGGCGAAGCTTTCCGA 

CCTCCTCAAGTGGGTCAAGGAGACGACCCCCGAGGACCCC 12 0 

GCCTTGGCCATGACCGACCACGGCAACCTCTTCGGGGCCG 

TGGAGTTCTACAAGAAGGC CAC CGAAATGGGCAT CAAGCC 

CATCCTGGGCTACGAGGCCTACGTGGCGGCGGAAAGCCGC 24 0 

TTTGACCGCAAGCGGGGAAAGGGCCTAGACGGGGGCTACT 

TTCACCTCACCCTCCTCGCCAAGGACTTCACGGGGTACCA 

GAACCTGGTGCGCCTGGCGAGCCGGGCTTACCTGGAGGGG 3 6 0 

TTTTACGAAAAGCCCCGGATTGACCGGGAGATCCTGCGCG 

AGCACGCCGAGGGCCTCATCGCCCTCTCGGGGTGCCTCGG 

GGCGGAGATCCCCCAGTTCATCCTCCAGGACCGTCTGGAC 4 8 0 

CTGGCCGAGGCCCGGCTCAACGAGTACCTCTCCATCTTCA 

AGGACCGCTTCTTCATCGAGATCCAGAACCACGGCCTCCC 

CGAGCAGAAAAAGGTCAACGAGGTCCTCAAGGAGTTCGCC 6 0 0 

CGAAAGTACGGCCTGGGGATGGTGGCCACCAACGACGGCC 

AT T ACGTGAGGAAGGAGGAC GCCCGCGCC CACGAGGT C CT 

CCTCGCCATCCAGTCCAAGAGCACCCTGGACGACCCCGGG 72 0 

CGCTGGCGCTTCCCCTGCGACGAGTTCTACGTGAAGACCC 

CCGAGGAGATGCGGGCCATGTTCCCCGAGGAGGAGTGGGG 

GGACGAGCCCTTTGACAACACCGTGGAGATCGCCCGCATG 84 0 

TGCAACGTGGAGCTGCCCATCGGGGACAAGATGGTCTACC 

GAATCCCCCGCTTCCCCCTCCCCGAGGGGCGGACCGAGGC 

CCAGTACCTCATGGAGCTCACCTTCAAGGGGCTCCTCCGC 9 6 0 

CGCTACCCGGACCGGATCACCGAGGGCTTCTACCGGGAGG 

TCTTCCGCCTTTTGGGGAAGCTTCCCCCCCACGGGGACGG 

GGAGGCCTTGGCCGAGGCCTTGGCCCAGGTGGAGCGGGAG 10 8 0 

GCTTGGGAGAGGCTCATGAAGAGCCTCCCCCCTTTGGCCG 

GGGTCAAGGAGTGGACGGCGGAGGCCATTTTCCACCGGGC 

CCTTTACGAGCTTTCCGTGATAGAGCGCATGGGGTTTCCC 12 0 0 

GGCTACTTCCTCATCGTCCAGGACTACATCAACTGGGCCC 

GGAGAAACGGCGTCTCCGTGGGGCCCGGCAGGGGGAGCGC 

CGCCGGGAGCCTGGTGGCCTACGCCGTGGGGATCACCAAC 13 2 0 

ATTGACCCCCTCCGCTTCGGCCTCCTCTTTGAGCGCTTCC 

TGAACCCGGAGAGGGTCTCCATGCCCGACATTGACACGGA 

CTTCTCCGACCGGGAGCGGGACCGGGTGATCCAGTACGTG 14 4 0 

CGGGAGCGCTACGGCGAGGACAAGGTGGCCCAGATCGGCA 

CCCTGGGAAGCCTCGCCTCCAAGGCCGCCCTCAAGGACGT 

GGCCCGGGTCTACGGC AT C C C C CACAAGAAGGCGGAGGAA 15 6 0 

TTGGCCAAGCTCATCCCGGTGCAGTTCGGGAAGCCCAAGC 

CCCTGCAGGAGGCCATCCAGGTGGTGCCGGAGCTTAGGGC 

GGAGATGGAGAAGGAC C C CAAGGTGCGGGAGGT C CT CGAG 16 8 0 

GTGGCCATGCGCCTGGAGGGCCTGAACCGCCACGCCTCCG 

TCCACGCCGCCGGGGTGGTGATCGCCGCCGAGCCCCTCAC 

GGACCTCGTCCCCCTCATGCGCGACCAGGAAGGGCGGCCC 18 0 0 

GTCACCCAGTACGACATGGGGGCGGTGGAGGCCTTGGGGC 

TTTTGAAGATGGACTTTTTGGGCCTCCGCACCCTCACCTT 



FIG. 16A 



GTGGAGCTGGACTACGATGCCCTCCCCCTGGACGACCCCA 

AGACCTTCGCCCTCCTCTCCCGGGGGGAGACCAAGGGGGT 

CTTCCAGCTGGAGTCGGGGGGGATGACCGCCACGCTCCGC 2 04 0 

GGCCTCAAGCCGCGGCGCTTTGAGGACCTGATCGCCATCC 

TCTCCCTCTACCGCCCCGGGCCCATGGAGCACATCCCCAC 

CTACATCCGCCGCCACCACGGGCTGGAGCCCGTGAGCTAC 216 0 

AGCGAGTTTCCCCACGCCGAGAAGTACCTAAAGCCCATCC 

TGGACGAGACCTACGGCATCCCCGTCTACCAGGAGCAGAT 

CATGCAGATCGCCTCGGCCGTGGCGGGGTACTCCCTGGGC 22 8 0 

GAGGCGGACCTCCTGCGGCGGTCCATGGGCAAGAAGAAGG 

TGGAGGAGATGAAGT C C CAC C GGGAG CGCTTCGTC CAGGG 

GGCCAAGGAAAGGGGCGTGCCCGAGGAGGAGGCCAACCGC 24 0 0 

CTCTTTGACATGCTGGAGGCCTTCGCCAACTACGGCTTCA 

ACAAATCCCACGCTGCCGCCTACAGCCTCCTCTCCTACCA 

GACCGCCTACGTGAAGGCCCACTACCCCGTGGAGTTCATG 252 0 

GCCGCCCTCCTCTCCGTGGAGCGGCACGACTCCGACAAGG 

TGGCCGAGTACATCCGCGACGCCCGGGCCATGGGCATAGA 

GGTCCTTCCCCCGGACGTCAACCGCTCCGGGTTTGACTTC 2 64 0 

CTGGTCCAGGGCCGGCAGATCCTTTTCGGCCTCTCCGCGG 

TGAAGAACGTGGGCGAGGCGGCGGCGGAGGCCATTCTCCG 

GGAGCGGGAGCGGGGCGGCCCCTACCGGAGCCTCGGCGAC 2 7 6 0 

TTCCTCAAGCGGCTGGACGAGAAGGTGCTCAACAAGCGGA 

CCCTGGAGTCCCTCATCAAGGCGGGCGCCCTGGACGGCTT 

CGGGGAAAGGGCGCGGCTCCTCGCCTCCCTGGAAGGGCTC 2 8 8 0 

CTCAAGTGGGCGGCCGAGAACCGGGAGAAGGCCCGCTCGG 

GCATGATGGGCCTCTTCAGCGAAGTGGAGGAGCCGCCTTT 

GGCCGAGGCCGCCCCCCTGGACGAGATCACCCGGCTCCGC 3 0 0 0 

TACGAGAAGGAGGCCCTGGGGATCTACGTCTCCGGCCACC 

CCATCTTGCGGTACCCCGGGCTCCGGGAGACGGCCACCTG 

CACCCTGGAGGAGCTTCCCCACCTGGCCCGGGACCTGCCG 312 0 

CCCCGGTCTAGGGTCCTCCTTGCCGGGATGGTGGAGGAGG 

TGGTGCGCAAGCCCACAAAGAGCGGCGGGATGATGGCCCG 

CTTCGTCCTCTCCGACGAGACGGGGGCGCTTGAGGCGGTG 3 24 0 

GCATTCGGCCGGGCCTACGACCAGGTCTCCCCGAGGCTCA 

AGGAGGAC AC CCCCGTGCTC GT C C T CG C CGAGGTGGAGC G 

GGAGGAGGGGGGCGTGCGGGTGCTGGCCCAGGCCGTTTGG 3 3 6 0 

ACCTACGAGGAGCTGGAGCAGGTCCCCCGGGCCCTCGAGG 

TGGAGGTGGAGGCCTCCCTCCTGGACGACCGGGGGGTGGC 

CCACCTGAAAAGCCTCCTGGACGAGCACGCGGGGACCCTC 34 8 0 

CCCCTGTACGTCCGGGTCCAGGGCGCCTTCGGCGAGGCCC 

TCCTCGCCCTGAGGGAGGTGCGGGTGGGGGAGGAGGCTGT 

AGGCGGCCGCGTGGTTCCGGGCCTACCTCCTGCCCGACCG 3 6 0 0 

GGAGGTCCTTCTCCAGGGCGGCCAGGCGGGGGAGGCCCAG 

GAGGCGGTGCCCTTCTAGGGGGTGGGCCGTGAGACCTAGC 

GCCATCGTTCTCGCCGGGGGCAAGGAGGCCTGGGCCCGAC 3 72 0 

CCCTTTTGG 
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MGRELRFAHLHQHTQFSLLDGAPKLSDLLKWVEETTPEDP 
AL AMTDHGNL FGAVE F Y KKAT EMG I K P I LG YE AYVAAE S R 
FDRKRGKGLDGGYFHLTLLAKDFTGYQNLVRLASRAYLEG 12 0 

FYEKPRIDREILREHAEGLIALSGCLGAEIPQFILQDRLD 
LAEARLNEYLSIFKDRFFIEIQNHGLPEQKKVNEVLKEFA 
RKYGLGMVATNDGHYVRKEDARAHEVLLAIQSKSTLDDPG 24 0 

ALALPCEEFYVKTPEEMRAMFPEEEVGGRSPLTTPWRSPH 
VQRGAAIGTRWSTRIPRFPLPEGRTEAQYLMELTFKGLLR 
RYPDRITEGFYREVFRLSGKLPPHGDGEALAEALAQVERE 3 6 0 

AWERLMKS LP PLAGVKE WTAEA I FHRAL YELS A I ERMGF P 
GLLPHRPGLHQLGPEKGVSVGPGRGGAAGSLVAYAVGITN 
IDPLRFGLLFERFLNPERVSMPDIDTDFSDRERDRVIQYV 4 8 0 

RERYGEDKVAQ I GTLGS LAS KAALKEVARVYG I PRKKAEE 
LAKLIPVQFGKPKPLQEAIQWPELRAEMEKDPKVREVLE 
VAMRLEGLNRHASVHAGRGGVFSEPLTDLVPLCATRKGGP 6 0 0 

YTQYDMGAVEALGLLKMDFLGLRTLTFLDEVKRIVKASQG 
VELDYDALPLDDPKTFALLSRGETKGVFQLESGGMTATLR 
GLKPRRFEDLIAILSLYRPGPMEHIPTYIRRHHGLEPVSY 72 0 

SEFPHAEKYLKPILDETYGIPVYQEQIMQIASAVAGYSLG 
EADLLRRSMGKKKVEEMKSHRERFVQGAKERGVPEEEANR 
LFDMLEAFANYGFNKSHAAAYSLLSYQTAYVKAHYPVEFM 84 0 

AALLSVERHDSDKVAE Y I RDARAMG I EVL P PDVNRSGFDF 
LVQGRQILFGLSAVKNVGEAAAEAILRERERGGPYRSLGD 
FLKRLDEKVLNKRTLESL I KAGALDGFGERARLLASLEGL 9 6 0 

LKWAAENRE KARS GMMGL FSEVEEPP LAEAAPLDE I TRLR 
YEKEALGIYVSGHPILRYPGLRETATCTLEELPHLARDLP 
PRSRVLLAGMVEEWRKPTKSGGMMARFVLSDETGALEAV 108 0 

AFGRAYDQVSPRLKEDTPVLVLAEVEREEGGVRVLAQAVW 
TYQELEQVPRALEVEVEASLPDDRGVAHLKSLLDEHAGTL 
PLYVRVQGAFGEALLALREVRVGEEALGALEAAGFPAYLL 12 0 0 

PNRE VS PRLTGS GGPRGRAL STGLAL KTYP I AL PGGNE AL 
ARPLL 
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FIG.18A 

ATGGTGGAGCGGGTGGTGCGGACCCTTCTGGACGGGAGGT 4 0 
TCCTCCTGGAGGAGGGGGTGGGGCTTTGGGAGTGGCGCTA 
CCCCTTTCCCCTGGAGGGGGAGGCGGTGGTGGTCCTGGAC 12 0 
CTGGAGACCACGGGGCTTGCCGGCCTGGACGAGGTGATTG 
AGGTGGGCCTCCTCCGCCTGGAGGGGGGGAGGCGCCTCCC 2 00 
CTTCCAGAGCCTCGTCCGGCCCCTCCCGCCCGCCGAAGCC 
CGTTCGTGGAACCTCACCGGCATCCCCCGGGAGGCCCTGG 2 80 
AGGAGGCCCCCTCCCTGGAGGAGGTTCTGGAGAAGGCCTA 
CCCCCTCCGCGGCGACGCCACCTTGGTGATCCACAACGCC 3 60 
GCCTTTGACCTGGGCTTCCTCCGCCCGGCCTTGGAGGGCC 
TGGGCTACCGCCTGGAAAACCCCGTGGTGGACTCCCTGCG 440 
CTTGGCCAGACGGGGCTTACCAGGCCTTAGGCGCTACGGC 
CTGGACGCCCTCTCCGAGGTCCTGGAGCTTCCCCGAAGGA 52 0 
CCTGCCACCGGGCCCTCGAGGACGTGGAGCGCACCCTCGC 
CGTGGTGCACGAGGTATACTATATGCTTACGTCCGGCCGT 600 
CCCCGCACGCTTTGGGAACTCGGGAGGTAG 



MVERWRTLLDGRFLLEEGVGLWEWRYPFPLEGEAVWLD 4 0 
LETTGLAGLDEVIEVGLLRLEGGRRLPFQSLVRPLPPAEA 
RSWNLTG I PREAL EEAP S LE EVLEKAYP LRGDATL.VI HNA 12 0 
AFDLGF LRP ALEGLG YRL EN PWDSLRLARRG L PGLRRYG 

LDALSEVLELPRRTCHRALEDVERTLAWHEVYYMLTSGR 2 00 
PRTLWELGRZ 
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GTGTCGCACGAGGCCGTCTGGCAACACGTTCTGGAGCAGA 

TCCGCCGCAGCATCACCGAGGTGGAGTTCCACACCTGGTT 

TGAAAGGATCCGCCCCTTGGGGATCCGGGACGGGGTGCTG 120 

GAGCTCGCCGTGCCCACCTCCTTTGCCCTGGACTGGATCC 

GGCGCC ACT ACGCCGGCCTC ATC C AGGAGGGC C CTCGGCT 

CCTCGGGGCCCAGGCGCCCCGGTTTGAGCTCCGGGTGGTG 240 

CCCGGGGTCGTAGTCCAGGAGGACATCTTCCAGCCCCCGC 

CGAGCCCCCCGGCCCAAGCTCAACCCGAAGATACCTTTAA 

AACTTCGTGGTGGGGCCCAACAACTCCATGGCCCCACGGC 3 60 

GGCGCCGTGGCCGTGGCCGAGTCCCCCGGCCGGGCCTACA 

ACCCCCTCTTCATCTACGGGGGCCGTGGCCTGGGAAAGAC 

CTACCTGATGCACGCCGTGGGCCCACTCCGTGCGAAGCGC 480 

TTCCCCCACATGAGATTAGAGTACGTTTCCACGGAAACTT 

TCACCAACGAGCTCATCAACCGGCCATCCGCGAGGGACCG 

GATGACGGAGTTCCGGGAGCGGTACCGCTCCGTGGACCTC 600 

CTGCTGGTGGACGACGTCCAGTTCATCGCCGGAAAGGAGC 

GCACCCAGGAGGAGTTTTTCCACACCTTCAACGCCCTTTA 

CGAGGCCCACAAGCAGATCATCCTCTCCTCCGACCGGCCG 72 0 

CCCAAGGACATCCTCACCCTGGAGGCGCGCCTGCGGAGCC 

GCTTTGAGTGGGGCCTGATCACCGACAATCCAGCCCCCGA 

CCTGGAAACCCGGATCGCCATCCTGAAGATGAACGCCAGC 840 

AGCGGGCCTGAGGATCCCGAGGACGCCCTGGAGTACATCG 

CCCGGCAGGTCACCTCCAACATCCGGGAGTGGGAAGGGGC 

CCTCATGCGGGCATCGCCTTTCGCCTCCCTCAACGGCGTT 960 

GAGCTGACCCGCGCCGTGGCGGCCAAGGCTCTCCGACATC 

TTCGCCCCAGGGAGCTGGAGGCGGACCCCTTGGAGATCAT 

CCGCAAAGCGGCGGGACCAGTTCGGCCTGAAACCCCGGGA 1080 

GGAGCTCACGGGGAGCGCCGCAAGAAGGAGGTGGTCCTCC 

CCCGGCAGCTCGCCATGTACCTGGTGCGGGAGCTCACCCC 

GGCCTCCCTGCCCGAGATCGACCAGCTCAACGACGACCGG 12 0 0 

GACCACACCACGGTCCTCTACGCCATCCAGAAGGTCCAGG 

AGCTCGCGGAAAGCGACCGGGAGGTGCAGGGCCTCCTCCG 

CACCCTCCGGGAGGCGTGCACATGA 
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VSHEAVWQHVLEHIRRSITEVEFHTWFERIRPLGIRDGVL 
ELAVPTSFALDWIRRHYAGLIQEGPRLLGAQAPRFELRW 
- PGVWQEDIFQPPPSPPAQAQPEDTFKTSWWGPTTPWPHG 12 0 
GAVAVAESPGRAYNPLFIYGGRGLGKTYLMHAVGPLRAKR 
FPHMRLEYVSTETFTNELINRPSARDRMTEFRERYRSVDL 
LLVDDVQFIAGKERTQEEFFHTFNALYEAHKQIILSSDRP 240 
PKDILTLEARLRSRFEWGLITDNPAPDLETRIAILKMNAS 
SGPEDPEDALEYIARQVTSNIREWEGALMRASPFASLNGV 
ELTRAVAAKALRHLRPRELEADPLEI IRKAAGPVRPETPG 3 6 0 
GAHGERRKKEWL PRQLAM YLVRE LT PAS L P E I DQ LNDDR 
DHTTVLYAIQKVQELAESDREVQGLLRTLREACT 
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ATGAACATAACGGTTCCCAAAAAACTCCTCTCGGACCAGC 4 0 
TTTCCCTCCTGGAGCGCATCGTCCCCTCTAGAAGCGCCAA 
CCCCCTCTACACCTACCTGGGGCTTTACGCCGAGGAAGGG 12 0 
GCCTTGATCCTCTTCGGGACCAACGGGGAGGTGGACCTCG 
AGGTCCGCCTCCCCGCCGAGGCCCAAAGCCTTCCCCGGGT 20 0 
GCtCGTCCCCGCCCAGCCCTTCTTCCAGCTGGTGCGGAGC 
CTTCCTGGGGACCTCGTGGCCCTCGGCCTCGCCTCGGAGC 2 8 0 
CGGGCCAGGGGGGGCAGCTGGAGCTCTCCTCCGGGCGTTT 
CCGCACCCGGCTCAGCCTGGCCCCTGCCGAGGGCTACCCC 3 6 0 
GAGCTTCTGGTGCCCGAGGGGGAGGACAAGGGGGCCTTCC 
CCCTCCGGACGCGGATGCCCTCCGGGGAGCTCGTCAAGGC 4 4 0 
CTTGACCCACGTGCGCTACGCCGCGAGCAACGAGGAGTAC 
CGGGCCATCTTCCGCGGGGTGCAGCTGGAGTTCTCCCCCC 52 0 
AGGGCTTCCGGGCGGTGGCCTCCGACGGGTACCGCCTCGC 
CCTCTACGACCTGCCCCTGCCCCAAGGGTTCCAGGCCAAG 6 00 
GCCGTGGTCCCCGCCCGGAGCGTGGACGAGATGGTGCGGG 
TCCTGAAGGGGGCGGACGGGGCCGAGGCCGTCCTCGCCCT 680 
GGGCGAGGGGGTGTTGGCCCTGGCCCTCGAGGGCGGAAGC 
GGGGTCCGGATGGCCCTCCGCCTCATGGAAGGGGAGTTCC 7 60 
CCGACTACCAGAGGGTCATCCCCCAGGAGTTCGCCCTCAA 
GGTCCAGGTGGAGGGGGAGGCCCTCAGGGAGGCGGTGCGC 840 
CGGGTGAGCGTCCTCTCCGACCGGCAGAACCACCGGGTGG 
ACCTCCTTTTGGAGGAAGGCCGGATCCTCCTCTCCGCCGA 920 
GGGGGACTACGGCAAGGGGCAGGAGGAGGTGCCCGCCCAG 
GTGGAGGGGCCGGACATGGCCGTGGCCTACAACGCCCGCT 1000 
ACCTCCTCGAGGCCCTCGCCCCCGTGGGGGACCGGGCCCA 
CCTGGGCATCTCCGGGCCCACGAGCCCGAGCCTCATCTGG 10 8 0 
GGGGACGGGGAGGGGTACCGGGCGGTGGTGGTGCCCCTCA 
GGGTCTAG 112 8 
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MNITVPKKLLSDQLSLLERIVPSRSANPLYTYLGLYAEEG 4 0 
ALILFGTNGEVDLEVRLPAEAQSLPRVLVPAQPFFQLVRS 
L PGDLVALGL AS E PGQGGQLE LS SGRFRTRLS LAP AEG Y P 12 0 
- ELLVPEGEDKGAFPLRTRMPSGELVKALTHVRYAASNEEY 

RAI FRGVQLEFS PQGFRAVASDGYRLALYDLPLPQGFQAK 2 0 0 
AWPARSVDEMVRVLKGADGAEAVLALGEGVLALALEGGS. 
GVRMALRLMEGEFPDYQRVIPQEFALKVQVEGEALREAVR 2 80 
RVSVLSDRQNHRVDLLLEEGRILLSAEGDYGKGQEEVPAQ 
VEGPDMAVAYNARYLLEALAPVGDRAHLGI SGPTSPSLIW 3 60 
GDGEGYRAVWPLRVZ 
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ATGAGTAAGGATTTCGTCCACCTTCACCTGCACACCCAGTTCTCACTCCT 
GGACGGGGCTATAAAGATAGACGAGCTCGTGAAAAAGGCAAAGGAGTATG 10 0 
GATACAAAGCTGTCGGAATGTCAGACCACGGAAACCTCTTCGGTTCGTAT 
AAATTCTACAAAGCCCTGAAGGCGGAAGGAATTAAGCCCATAATCGGCAT 2 0 0 
GGAAGCCTACTTTACCACGGGTTCGAGGTTTGACAGAAAGACTAAAACGA 
GCGAGGACAACATAACCGACAAGTACAACCACCACCTCATACTTATAGCA 3 00 
AAGGACGAAAAGGTCTAAAGAACTTAATGAAGCTCTCAACCCTCGCCTAC 
AAAGAAGGTTTTTACTACAAACCCAGAATTGATTACGAACTCCTTGAAAA 4 00 
GTACGGGGAGGGCCTAATAGCCCTTACCGCATGCCTGAAAGGTGTTCCCA 
CCTACTACGCTTCTATAAACGAAGTGAAAAAGGCGGAGGAATGGGTAAAG 500 
AAGTTCAAGGATATATTCGGAGATGACCTTTATTTAGAACTTCAAGCGAA 
CAACATTCCAGAACAGGAAGTGGCAAACAGGAACTTAATAGAGATAGCCA 6 0 0 
AAAAGTACGATGTGAAACTCATAGCGACGCAGGACGCCCACTACCTCAAT 
CCCGAAGACAGGTACGCCCACACGGTTCTTATGGCACTTCAAATGAAAAA 7 0 0 
GACCATTCACGAACTGAGTTCGGGAAACTTCAAGTGTTCAAACGAAGACC 
TTCACTTTGCTCCACCCGAGTACATGTGGAAAAAGTTTGAAGGTAAGTTC 8 00 
GAAGGCTGGGAAAAGGCACTCCTGAACACTCTCGAGGTAATGGAAAAGAC 
AGCGGACAGCTTTGAGATATTTGAAAACTCCACCTACCTCCTTCCCAAGT 900 
ACGACGTTCCGCCCGACAAAACCCTTGAGGAATACCTCAGAGAACTCGCG 
TACAAAGGTTTAAGACAGAGGATAGAAAGGGGACAAGCTAAGGATACTAA 10 00 
AGAGTACTGGGAGAGGCTCGAGTACGAACTGGAAGTTATAAACAAAATGG 
GCTTTGCGGGATACTTCTTGATAGTTCAGGACTTCATAAACTGGGCTAAG 110 0 
AAAAACGACATACCTGTTGGACCCGGAAGGGGAAGTGCTGGAGGTTCCCT 
CGTCGCATACGCCATCGGAATAACGGACGTTGACCCTATAAAGCACGGAT 12 0 0 
TCCTTTTTGAGAGGTTCTTAAACCCCGAAAGGGTTTCCATGCCGGATATA 
GACGTGGATTTCTGTCAGGACAACAGGGAAAAGGTCATAGAGTACGTAAG 13 0 0 
GAACAAGTACGGACACGACAACGTAGCTCAGATAATCACCTACAACGTAA 
TGAAGGCGAAGCAAACACTGAGAGACGTCGCAAGGGCCATGGGACTCCCC 14 0 0 
TACTCCACCGCGGACAAACTCGCAAAACTCATTCCTCAGGGGGACGTTCA 
GGGAACGTGGCTCAGTCTGGAAGAGATGTACAAAACGCCTGTGGAGGAAC 150 0 
TCCTTCAGAAGTACGGAGAACACAGAACGGACATAGAGGACAACGTAAAG 
AAGTTCAGACAGATATGCGAAGAAAGTCCGGAGATAAAACAGCTCGTTGA 16 0 0 
GACGGCCCTGAAGCTTGAAGGTCTCACGAGACACACCTCCCTCCACGCCG 
CGGGAGTGGTTATAGCACCAAAGCCCTTGAGCGAGCTCGTTCCCCTCTAC 170 0 
TACGATAAAGAGGGCGAAGTCGCAACCCAGTACGACATGGTTCAGCTCGA 
AGAACTCGGTCTCCTGAAGATGGACTTCCTCGGACTCAAAACCCTCACAG 18 0 0 
AACTGAAACTCATGAAAGAAC T CATAAAGGAAAGACACGGAGTGGATATA 
AACTTCCTTGAACTTCCCCTTGACGACCCGAAAGTTTACAAACTCCTTCA 19 0 0 
GGAAGGAAAAACCACGGGAGTGTTCCAGCTCGAAAGCAGGGGAATGAAAG 
AACTCCTGAAGAAACTAAAGCCCGACAGCTTTGACGACATCGTTGCGGTC 2 0 0 0 
CTCGCACTCTACAGACCCGGACCTCTAAAGAGCGGACTCGTTGACACATA 
CATTAAGAGAAAGCACGGAAAAGAACCCGTTGAGTACCCCTTCCCGGAGC 210 0 
TTGAACCCGTCCTTAAGGAAACCTACGGAGTAATCGTTTATCAGGAACAG 
GTGATGAAGATGTCTCAGATACTTTCCGGCTTTACTCCCGGAGAGGCGGA 22 0 0 
TACCCTCAGAAAGGCGATAGGTAAGAAGAAAGCGGATTTAATGGCTCAGA 
TGAAAGACAAGTTCATACAGGGAGCGGTGGAAAGGGGATACCCTGAAGAA 2 3 0 0 
AAGATAAGGAAGCTCTGGGAAGACATAGAGAAGTTCGCTTCCTACTCCTT 
CAACAAGTCTCACTCGGTAGCTTACGGGTACATCTCCTACTGGACCGCCT 24 0 0 
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ACGTTAAAGCCCACTATCCCGCGGAGTTCTTCGCGGTAAAACTCACAACT 

GAAAAGAACGACAACAAGTTCCTCAACCTCATAAAAGACGCTAAACTCTT 2 5 0 0 
CGGATTTGAGATACTTCCCCCCGACATAAACAAGAGTGATGTAGGATTTA 

CGATAGAAGGTGAAAACAGGATAAGGTTCGGGCTTGCGAGGATAAAGGGA 2 6 0 0 
GTGGGAGAGGAAACTGCTAAGATAATCGTTGAAGCTAGAAAGAAGTATAA 

GCAGTTCAAAGGGCTTGCGGACTTCATAAACAAAACCAAGAACAGGAAGA 2 7 0 0 
TAAACAAGAAAGTCGTGGAAGCACTCGTAAAGGCAGGGGCTTTTGACTTT 

ACTAAGAAAAAGAGGAAAGAACTACTCGCTAAAGTGGCAAACTCTGAAAA 2 8 00 
AGCATTAATGGCTACACAAAACTCCCTTTTCGGTGCACCGAAAGAAGAAG 

TGGAAGAACTCGACCCCTTAAAGCTTGAAAAGGAAGTTCTCGGTTTTTAC 2 9 0 0 
ATTTCAGGGCACCCCCTTGACAACTACGAAAAGCTCCTCAAGAACCGCTA 

CACACCCATTGAAGATTTAGAAGAGTGGGACAAGGAAAGCGAAGCGGTGC 3 0 00 
TTACAGGAGTTATCACGGAACTCAAAGTAAAAAAGACGAAAAACGGAGAT 

TACATGGCGGTCTTCAACCTCGTTGACAAGACGGGACTAATAGAGTGTGT 310 0 
CGTCTTCCCGGGAGTTTACGAAGAGGCAAAGGAACTGATAGAAGAGGACA 

GAGTAGTGGT AGT CAAAGGT T T T C T GGACGAGGACCTTGAAACGGAAAAT 3 2 0 0 
GTCAAGTTCGTGGTGAAAGAGGTTTTCTCCCCTGAGGAGTTCGCAAAGGA 

GATGAGGAATACCCTTTATATATTCTTAAAAAGAGAGCAAGCCCTAAACG 3 3 00 
GCGTTGCCGAAAAACTAAAGGGAATTATTGAAAACAACAGGACGGAGGAC 

GGATACAACTTGGTTCTCACGGTTGATCTGGGAGACTACTTCGTTGATTT 34 00 
AGCACTCCCACAAGATATGAAACTAAAGGCTGACAGAAAGGTTGTAGAGG 

AGATAGAAAAACTGGGAGTGAAGGTCATAATTTAGTAAATAACCCTTACT 3 5 00 
TCCGAGTAGTCCCC 
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MSKDFVHLHLHTQFSLLDGAIKIDELVKKAKEYGYKAVGMSDHGNLFGSY 
KFYKALKAEGIKPI IGMEAYFTTGSRFDRKTKTSEDNITDKYNHHLILIA 10 0 
KDDKGLKNLMKLSTLAYKEGFYYKPRIDYELLEKYGEGLIALTACLKGVP 
TYYASINEVKKAEEWVKKFKDIFGDDLYLELQANNIPEQEVANRNLIEIA 200 
KK YDVKL I ATQD AH YLN P E D R YAHT VLMALQMKKT I HE L S S GNF KC S NED 
LHFAPPEYMWKKFEGKFEGWEKALLNTLEVMEKTADSFEIFENSTYLLPK 3 0 0 
YDVPPDKTLEEYLRELAYKGLRQRIERGQAKDTKEYWERLEYELEVINKM 
GFAGYF L I VQD F INWAKKND I PVG PGRGS AGGS LVAYAI G I TD VD P I KHG 4 0 0 
FLFERFLNPERVSMPDIDVDFCQDNREKVIEYVRNKYGHDNVAQI ITYNV 
MKAKQTLRDVARAMGLPYSTADKLAKLIPQGDVQGTWLSLEEMYKTPVEE 5 0 0 
LLQKYGEHRTD I EDNVKKFRQ I CEE S PE I KQLVETALKLEGLTRHTSLHA 
AGWIAPKPLSELVPLYYDKEGEVATQYDMVQLEELGLLKMDFLGLKTLT 6 0 0 
ELKLMKELIKERHGVDINFLELPLDDPKVYKLLQEGKTTGVFQLESRGMK 
ELLKKLKPDSFDDIVAVLALYRPGPLKSGLVDTYIKRKHGKEPVEYPFPE 7 0 0 
LEPVLKETYGVIVYQEQVMKMSQILSGFTPGEADTLRKAIGKKKADLMAQ 
MKDKFIQGAVERGYPEEKIRKLWEDIEKFASYSFNKSHSVAYGYISYWTA 800 
YVKAHYPAEFFAVKLTTEKNDNKFLNLIKDAKLFGFEILPPDINKSDVGF 
T I EGENR I RF GLAR I KGVGE E TAK I I VEARKKYKQFKGLADF INKTKNRK 9 0 0 
INKKWEALVKAGAFDFTKKKRKELLAKVANSEKALMATQNSLFGAPKEE 
VEELDPLKLEKEVLGF Y I SGHPLDNYEKLLKNRYTP IEDLEEWDKESEAV 10 0 0 
LTGVI TE LKVKKTKNGD YMAVFNLVD KTGL I E C WF PGVYEE AKEL I EED 
RVWVKGFLDEDLETENVKFWKEVFSPEEFAKEMRNTLYIFLKREQALN 110 0 
GVAEKLKGIIENNRTEDGYNLVLTVDLGDYFVDLALPQDMKLKADRKWE 
EIEKLiGVKVI I 1161 
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ATGAACTACGTTCCCTTCGCGAGAAAGTACAGACCGAAATTCTTCAGGGA 
AGTAATAGGACAGGAAGCT C C CGT AAGGAT AC T C AAAAACG CT AT AAAAA 100 
ACGACAGAGTGGCTCACGCCTACCTCTTTGCCGGACCGAGGGGGGTTGGG 
AAGACGACT ATTG CAAGAATT CT CG C AAAAG C T T TGAACTGT AAAAAT C C 2 00 
CTCCAAAGGTGAGCCCTGCGGTGAGTGCGAAAACTGCAGGGAGATAGACA 
GGGGTGTGTTCCCTGACTTAATTGAAATGGATGCCGCCTCAAACAGGGGT 3 00 
ATAGACGACGTAAGGGCATTAAAAGAAGCGGTCAATTACAAACCTATAAA 
AGGAAAGTACAAGGTTTACATAATAGACGAAGCTCACATGCTCACGAAAG 4 00 
AAGCTTTCAACGCTCTCTTAAAAACCCTCGAAGAGCCCCCTCCCAGAACT 
GTTTTCGTCCTTTGTACCACGGAGTACGACAAAATTCTTCCCACGATACT 5 0 0 
CT C AAGGTGT CAGAGGATAAT CT T C T C AAAGGT AAGAAAGGAAAAAGT AA 
TAGAGTATCTAAAAAAGATATGTGAAAAGGAAGGGATTGAGTGCGAAGAG 6 00 
GGAGCCCTTGAGGTTCTGGCTCATGCCTCTGAAGGGTGCATGAGGGATGC 
AGCCTCTCTCCTGGACCAGGCGAGCGTTTACGGGGAAGGCAGGGTAACAA 700 
AAGAAGTAGTGGAGAACTTCCTCGGAATTCTCAGTCAGGAAAGCGTTAGG 
AGTTTTCTGAAATTGCTTCTGAACTCAGAAGTGGACGAAGCTATAAAGTT 80 0 
CCTCAGAGAACTCTCAGAAAAGGGCTACAACCTGACCAAGTTTTGGGAGA 
TGTTAGAAGAGGAAGTGAGAAACGCAATTTTAGTAAAGAGCCTGAAAAAT 9 0 0 
CCCGAAAGCGTGGTTCAGAACTGGCAGGATTACGAAGACTTCAAAGACTA 
CCCTCTGGAAGCCCTCCTCTACGTTGAGAACCTGATAAACAGGGGTAAAG 1000 
TTGAAGCGAGAACGAGAGAACCCTTAAGAGCCTTTGAACTCGCGGTAATA 
AAGAG C CTTATAGT CAAAGACAT AAT T C C C GT AT C C CAG CT CGGAAGTGT 110 0 
GGT AAAGGAAAC CAAAAAGGAAG AAAAGAAAGT TGAAGTAAAAGAAGAG C 
C AAAAGT AAAAGAAGAAAAAC C AAAGG AG C AGGAAGAGGACAGGT T C CAG 12 0 0 
AAAGTTTTAAACGCTGTGGACGGCAAAAT C CTTAAAAGAATACTTGAAGG 
GGCAAAAAGGGAAGAAAGAGACGGAAAAATCGTCCTAAAGATAGAAGCCT 13 0 0 
CTTATCTGAGAACCATGAAAAAGGAATTTGACTCACTAAAGGAGACTTTT 
CCTTTTTTAGAGTTTGAACCCGTGGAGGATAAAAAAAAACCTCAGAAGTC 14 0 0 
CAGCGGGACGAGGCTGTTTTAAAGGTAAAGGAGCTCTTCAATGCAAAAAT 
ACTCAAAGTACGAAGTAAAAGCTAAGGTCATAAAGGTGAGAATGCCCGTG 15 0 0 
GAAGAGAT AGGG CTGTTTAACG C AC TAAT AGACGG CTTG C C CAGGT ACG C 
ACTCACGAGGACGAAGGAAAAGGGAAAGGGAGAAGTTTTCGTTTTAGCGA 16 0 0 
CTCCTTATAAAGTCAAGGAATTGATGGAAGCTATGGAGGGTATGAAAAAA 
CACATAAAGGATTTAGAAATCCTCGGAGAGACGGATGAGGATTTAACTTT 170 0 
TTAAAGTATGGGTGTATCTGAGCAAAGGTTTAAGCTAAAAACAAACCTGA 
AACCCGCAGGGGACCAGCCGAAAGCCATAAAAAAACTCCTTGAAAACCTA 18 0 0 
AGGAAAGGCGTAAAAGAACAAACACTTCTCGGAGTCACGGGAAGCGGAAA 
GACTTTTACTCTAGCAAACGTAATAGCGAAGTACAACAAACCAACTCTTG 19 0 0 
TGGTAGTTCACAACAAAATTCTCGCGGCACAGCTATACAGGGAGTTTAAA 
GAACTATTCCCTGAAAACGCTGTAGAGTACTTTGTCTCTTACTACGACTA 2 0 0 0 
TTAC CAACCTGAAGCCTACATT C C CGAAAAAGATTTATACATAGAAAAGG 
ACGCGAGTATAAACGAAAGCTGGAACGTTTCAGACACTCCGCCACGATAT 210 0 
CCGTTCTAGAAAGGAGGGACGTTATAGTAGTTGCTTCAGTTTCTTGCATA 
TACGGACTCGGGAAACCTGAGCACTACGAAAACCTGAGGATAAAACTCCA 22 0 0 
AAGGGGAATAAGACTGAACTTGAGTAAGCTCCTGAGGAAACTCGTTGAGC 
TAGGATATCAGAGAAATGACTTTGCCATAAAGAGGGCTACCTTCTCGGTT 23 0 0 
AGGGGAGACGTGGTTGAGATAGTCCCTTCTCACACGGAAGATTACCTCGT 
GAGGGTAGAGTTCTGGGACGACGAAGTTGAAAGAATAGTCCTCATGGACG 24 0 0 
CTCTGAAC 
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MNYVPFARKYRPKFFREVI GQEAPVR I LKNAI KNDRVAHAYL F AG PRG VG 

KTTIARILAKALNCKNPSKGEPCGECENCREIDRGVFPDLIEMDAASNRG 100 
I DDVRALKE AVNYKP I KGKYKVY I I DE AHMLT KEAFNALLKTLEE P PPRT 

VFVLCTTEYDKILPTILSRCQRI IFSKVRKEKVIEYLKKICEKEGIECEE 200 
GALEVLAHASEGCMRDAASLLDQASVYGEGRVTKEWENFLGILSQESVR 

SFLKLLLNSEVDEAIKFLRELSEKGYNLTKFWEMLEEEVRNAILVKSLKN 3 0 0 
PESWQNWQDYEDFKDYPLEALLYVENLINRGKVEARTREPLRAFELAVI 

KSLIVKDI I PVSQLGSWKETKKEEKKVEVKEEPKVKEEKPKEQEEDRFQ 4 0 0 
KVLNAVDGK ILKRI LEGAKRE ERDGKIVLKI EAS YLRTMKKE FDSLKETF 

PFLEFEPVEDKKKPQKSSGTRLF 4 73 
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ATGCGCGTTAAGGTGGACAGGGAGGAGCTTGAAGAGGTTCTTAAAAAAGC 
AAGAGAAAGCACGGAAAAAAAAGCCGCACTCCCGATACTCGCGAACTTCT 10 0 
TACTCTCCGCAAAAGAGGAAAACTTAATCGTAAGGGCAACGGACTTGGAA 
AACTACCTTGTAGTCTCCGTAAAGGGGGAGGTTGAAGAGGAAGGAGAGGT 2 0 0 
TTGCGTCCACTCTCAAAAACTCTACGATATAGTCAAGAACTTAAATTCCG 
CTTACGTTTACCTTCATACGGAAGGTGAAAAACTCGTCATAACGGGAGGA 3 0 0 
AAGAGTACGTACAAACTTCCGACAGCTCCCGCGGAGGACTTTCCCGAATT 
TCCAGAAATCGTAGAAGGAGGAGAAACACTTTCGGGAAACCTTCTCGTTA 4 0 0 
ACGGAATAGAAAAGGTAGAGTACGCCATAGCGAAGGAAGAAGCGAACATA 
GCCCTTCAGGGAATGTATCTGAGAGGATACGAGGACAGAATTCACTTTGT 5 0 0 
GTTCGGACGGTCACAGGCTTGCACTTTATGAACCTCTACGTAAACATTGA 
AAAGAGTGAAGACGAGTCTTTTGCTTACTTCTCCACTCCCGAGTGGAAAC 6 0 0 
TCGCCGTTAGCTCCTGGAAGGAGAATTCCCGGACTACATGAGTGTCATCC 
CTGAGGAGTTTTCGGCGGAAGTCTTGTTTGAGACAGAGGAAGTCTTAAAG 70 0 
GTTTTAAAGAGGTTGAAGGCTTTAAGCGAAGGAAAAGTTTTTCCCGTGAA 
GATTACCTTAAGCGAAAACCTTGCCATCTTTGAGTTCGCGGATCCGGAGT 8 0 0 
TCGGAGAAGCGAGAGAGGAAATTGAAGTGGAGTACACGGGAGAGCCCTTT 
GAGATAGGATTCAACGGAAATACCTTATGGAGGCGCTTGACGCCTACGAC 9 0 0 
AGCGAAAGAGTGTGGTTCAAGTTCACAACCCCCGACACGGCCACTTTATT 
GGAGGCTGAAGATTACGAAAAGGAACCTTACAAGTGCATAATAATGCCGA 10 0 0 

TTTTAATTCCTGCGTTTAGCGAAGCCAAACCCAAGTCTTC 10 9 0 
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MRVKVDREELEEVLKKARESTEKKAALPILANFLLSAKEENLIVRATDLE 
NYLWSVKGEVEEEGEVCVHSQKLYDIVKNLNSAYVYLHTEGEKLVITGG 10 0 
KSTYKLPTAPAEDFPEFPEIVEGGETLSGNLLWGIEKVEYAIAKEEANI 
ALQGMYLRGYEDRIHFVGSDGHRLALYEPLGEFSKELLIPRKSLKVLKKL 2 00 
ITGIEDVNIEKSEDESFAYFSTPEWKLAVRLLEGEFPDYMSVIPEEFSAE 
VLFETEEVLKVLKRLKALSEGKVFPVKITLSENLAIFEFADPEFGEAREE 3 0 0 
IEVEYTGEPFEIGFNGKYLMEALDAYDSERVWFKFTTPDTATLLEAEDYE 
KEPYKCI IMPMRV 3 63 
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GAAGGAGAGGGT CTTCGT C CTT C ATGGAGAAGAG CAGTAT CT CATAAGAA 10 0 
CCTTTTTGTCTAAGCTGAAGGAAAAGTACGGGGAGAATTACACGGTTCTG 
TGGGGGGATGAGATAAGCGAGGAGGAATTCTACACTGCCCTTTCCGAGAC 2 00 
CAGTATATTCGGCGGTTCAAAGGAAAAAGCGGTGGTCATTTACAACTTCG 
GGGATTTCCTGAAGAAGCTCGGAAGGAAGAAAAAGGAAAAAGAAAGGCTT 3 00 
ATAAAAGTCCTCAGAAACGTAAAGAGTAACTACGTATTTATAGTGTACGA 
TGCGAAACTCCAGAAACAGGAACTTTCTTCGGAACCTCTGAAATCCGTAG 4 0 0 
CGT CTTT CGGCGGTATAGTGGT AG C AAAC AGG C T GAG C AAGGAGAGGAT A 
AAACAGCTCGTCCTTAAGAAGTTCAAAGAAAAAGGGATAAACGTAGAAAA 5 0 0 
CGATGCCCTTGAATACCTTCTCCAGCTCACGGGTTACAACTTGATGGAGC 
TCAAACTTGAGGTTGAAAAACTGATAGATTACGCAAGTGAAAAGAAAATT 6 0 0 
TTAAC ACT CGATGAGGTAAAGAGAGT AG C C T T C T C AGT CT CAGAAAACGT 
AAACGTATTTGAGTTCGTTGATTTACTCCTCTTAAAAGATTACGAAAAGG 7 0 0 
CTCTTAAAGTTTTGGACTCCCTCATTTCCTTCGGAATACACCCCCTCCAG 
ATTATGAAAATCCTGTCCTCCTATGCTCTAAAACTTTACACCCTCAAGAG 8 0 0 
GCTTGAAGAGAAGGGAGAGGACCTGAATAAGGCGATGGAAAGCGTGGGAA 
TAAAGAACAACTTTCTCAAGATGAAGTTCAAATCTTACTTAAAGGCAAAC 9 0 0 
T CTAAAGAGGACTTGAAGAAC C T AAT CCTCTCCCTC C AGAGGAT AGACGC 
TTTTTCTAAACTTTACTTTCAGGACACAGTGCAGTTGCTGGGGATTTCTT 10 0 0 
GACCTCAAGACTGGAGAGGGAAGTTGTGAAAAATACTT CT CATGGTGGAT 
AATCTTTTTTATGAAGTTTGCGGTTTGCGTTTTTCCCGGTTCT 10 93 
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VETTIFQFQKTFFTKPPKERVFVLHGEEQYLIRTFLSKLKEKYGENYTVL 

WGDEISEEEFYTALSETSIFGGSKEKAWIYNFGDFLKKLGRKKKEKERL 10 0 

IKVLRNVKSNYVFIVYDAKLQKQELSSEPLKSVASFGGIWANRLSKERI 

KQLVLKKFKEKGINVENDALEYLLQLTGYNLMELKLEVEKLIDYASEKKI 2 0 0 

LTLDEVKRVAFSVSENVNVFEFVDLLLLKDYEKALKVLDSLISFGIHPLQ 

IMKI LS S YALKLYTLKRLE E KGEDLNKAME SVG I KNNFLKMKFKS YLKAN 3 0 0 

SKEDLKNLILSLQRIDAFSKLYFQDTVQLLRDFLTSRLEREWKNTSHGG 



FIG. 41 



ATGGAAAAAGTTTTTTTGGAAAAACTCCAGAAAACCTTGCACATACCCGG 

AGGACTCCTTTTTTACGGCAAAGAAGGAAGCGGAAAGACGAAAACAGCTT 10 0 

TTGAATTTGCAAAAGGTATTTTATGTAAGGAAAACGTACCTGGGGATGCG 

GAAGTTGTCCCTCCTGCAAACACGTAAACGAGCTGGAGGAAGCCTTCTTT 2 0 0 

AAAGGAGAAATAGAAGACTTTAAAGTTTATAAGACAAGGACGGTAAAAAG 

CACTTCGTTTACCTTATGGGCGAACATCCCGACTTTGTGGTAATAATCCC 3 00 

GAG CGGAC ATT ACATAAAGATAG AAC AGAT AAGGGAAGTTAAGAACT T T G 

C CTATGTGAAG C C CG CACTAAG C AGG AGAAAAGT AATT ATAAT AGACGAC 4 0 0 

GCCCACGCGATGACCTCTCAGGCGGCAAACGCTCTTTTAAAGGTATTGGA 

AGAGCCACCTGCGGACACCACCTTTATCTTGACCACGAACAGGCGTTCTG 5 0 0 

CAATCCTGCCGACTATCCTCTCCAGAACTTTTCAAGTGGAGTTCAAGGGC 

TTTTCAGTAAAAGAGGTTATGGAAATAGCGAAAGTAGACGAGGAAATAGC 60 0 

GAAACTCTCTGGAGGCAGTCTAAAAAGGGCTATCTTACTAAAGGAAAACA 

AAGATATCCTAAACAAAGTAAAGGAATTCTTGGAAAACGAGCCGTTAAAA 7 0 0 

GTTTACAAGCTTGCAAGTGAATTCGAAAAGTGGGAACCTGAAAAGCAAAA 

ACTCTTCCTTGAAATTATGGAAGAATTGGTATCTCAAAAATTGACCGAAG 80 0 

AGAAAAAAGAC AATTACAC CTACCTTCTT GAT ACGAT C AGACT C T TT AAA 

GACGGACTCGCAAGGGGTGTAAACGAACCTCTGTGGCTGTTTACGTTAGC 9 0 0 

CGTTCAGGCGGATTAATAAACCGTTATTGATTCCGTAACATTTAAACCTT 

AATCTAAATTATGAGAGCCTTTGAAGGAGGTCTGGTATGGAAAATTTGAA 10 0 0 

GATTAGATATATAGATACGAGGAAGATAGGAACCGTGAGCGGTGTAAAAG 

T 1051 



FIG. 42 



MEKVFLEKLQKTLHIPGGLLFYGKEGSGKTKTAFEFAKGILCKENVPWGC 
GSCPSCKHVNELEEAFFKGEIEDFKVYKDKDGKKHFVYLMGEHPDFWII 
PSGHYIKIEQIREVKNFAYVKPALSRRKVI I IDDAHAMTSQAANALLKVL 
EEPPADTTFILTTNRRSAILPTILSRTFQVEFKGFSVKEVMEIAKVDEEI 
AKLSGGSLKRAILLKENKDILNKVKEFLENEPLKVYKLASEFEKWEPEKQ 
KLFLEIMEELVSQKLTEEKKDNYTYLLDTIRLFKDGLARGVNEPLWLFTL 
AVQAD 
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ATGAACTTCCTGAAAAAGTTCCTTTTACTGAGAAAAGCTCAAAAGTCTCC 

TTACTTCGAAGAGTTCTACGAAGAAATCGATTTGAACCAGAAGGTGAAAG 10 0 
ATGCAAGGTTTGTAGTTTTTGACTGCGAAGCCACAGAACTCGACGTAAAG 

AAGGCAAAACTCCTTTCAATAGGTGCGGTTGAGGTTAAAAACCTGGAAAT 2 0 0 
AGACCTCTCTAAATCTTTTTACGAGATACTCAAAAGTGACGAGATAAAGG 

CGGCGGAGATACATGGAATAACCAGGGAAGACGTTGAAAAGTACGGAAAG 3 0 0 
GAAC CAAAGGAAGT AATATACGACT T T C T GAAGT ACAT AAAGGGAAGCGT 

TCTCGTTGGCTACTACGTGAAGTTTGACGTCTCACTCGTTGAGAAGTACT 4 0 0 
CCATAAAGTACTTCCAGTATCCAATCATCAACTACAAGTTAGACCTGTTT 

AGTTTCGTGAAGAGAGAGTACCAGAGTGGCAGGAGTCTTGACGACCTTAT 500 
GAAGGAACTCGGTGTAGAAATAAGGGCAAGGCACAACGCCCTTGAAGATG 

CCTACATAACCGCTCTTCTTTTCCTAAAGTACGTTTACCCGAACAGGGAG 6 0 0 
TACAGACTAAAGGATCTCCCGATTTTCCTT 
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MNFLKKFLLLRKAQKSPYFEEFYEEIDLNQKVKDARFWFDCEATELDVK 
KAKLLS IGAVEVKNLE I DLSKSF YE I LKSDE I KAAE I HGI TREDVEKYGK 10 0 
EPKEVIYDFLKYIKGSVLVGYYVKFDVSLVEKYSIKYFQYPI INYKLDLF 
SFVKREYQSGRSLDDLMKELGVEIRARHNALEDAYITALLFLKYVYPNRE 2 00 
YRLKDLPIFL 
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ATGCTCAATAAGGTTTTTATAATAGGAAGACTTACGGGTGACCCCGTTAT 

AACTTATCTACCGAGCGGAACGCCCGTAGTAGAGTTTACTCTGGCTTACA 100 
ACAGAAGGTATAAAAAC CAGAACGGTGAATTT CAGGAGGAAAGT CAC T T C 

TTTGACGTAAAGGCGTACGGAAAAATGGCTGAAGACTGGGCTACACGCTT 200 
CTCGAAAGGATACCTCGTACT CGTAGAGGGAAGACT C T C C CAGGAAAAGT 

GGGAGAAAGAAGGAAAGAAGT T C T C AAAGGT C AGG AT AAT AG C GGAAAAC 3 00 
GTAAGATTAATAAACAGGCCGAAAGGTGCTGAACTTCAAGCAGAAGAAGA 

GGAGGAAGTTCCTCCCATTGAGGAGGAAATTGAAAAACTCGGTAAAGAGG 4 0 0 
AAGAGAAGCCTTTTACCGATGAAGAGGACGAAATACCTTTTTAATTTTGA 

GGAGGTTAAAGTATGGTAGTGAGAGCTCCTAAGAAGAAAGTTTGTATGTA 5 0 0 
CTGTGAACAAAAGAGAGAG C C AGAT T 



FIG. 46 



MLNKVF 1 1 GRLTGDPV I TYL PSGT PWE F TLAYNRR YKNQNGE FQEESHF 
FDVKAYGKMAEDWATRFSKGYLVLVEGRLSQEKWEKEGKKFSKVRI IAEN 10 0 
VRLINRPKGAELQAEEEEEVPPIEEEIEKLGKEEEKPFTDEEDEIPF 
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ATGCAATTTGTGGATAAACTTCCCTGTGACGAATCCGCCGAGAGGGCGGT 

TCTTGGCAGTATGCTTGAAGAC C C CGAAAACATAC CT CTGGTACTTGAAT 10 0 
ACCTTAAAGAAGAAGACTTCTGCATAGACGAGCACAAGCTACTTTTCAGG 

GTTCTTACAAACCTCTGGTCCGAGTACGGCAATAAGCTCGATTTCGTATT 200 
AAT AAAGGAT CAC CTTGAAAAGAAAAAC T T AC T C C AGAAAAT AC CTATAG 

ACTGGCT CGAAGAACTCTACGAGGAGGCGGTAT C C C CTGACACGCTTGAG 3 00 
GAAGTCTGCAAAATAGTAAAACAACGTTCCGCACAGAGGGCGATAATTCA 

ACTCGGTATAGAACTCATTCACAAAGGAAAGGAAAACAAAGACTTTCACA 4 0 0 
CATTAATCGAGGAAGCCCAGAGCAGGATATTTTCCATAGCGGAAAGTGCT 

ACATCTACGCAGTTTTACCATGTGAAAGACGTTGCGGAAGAAGTTATAGA 50 0 
ACTCATTTATAAATTCAAAAGCTCTGACAGGCTAGTCACGGGACTCCCAA 

GCGGTTTCACGGAACTCGATCTAAAGACGACGGGATTCCACCCTGGAGAC 6 0 0 
TTAATAATACTCGCCGCAAGACCCGGTATGGGGAAAACCGCCTTTATGCT 

CTCCATAATCTACAATCTCGCAAAAGACGAGGGAAAACCCTCAGCTGTAT 700 
TTTCCTTGGAAATGAGCAAGGAACAGCTCGTTATGAGACTCCTCTCTATG 

ATGTCGGAGGTCCCACTTTTCAAGATAAGGTCTGGAAGTATATCGAATGA 8 0 0 
AGATTTAAAGAAGCTTGAAGCAAGCGCAATAGAACTCGCAAAGTACGACA 

TATACCTCGACGACACACCCGCTCTCACTACAACGGATTTAAGGATAAGG 900 
GCAAGAAAGCTCAGAAAGGAAAAGGAAGTTGAGTTCGTGGCGGTGGACTA 

CTTGCAACTTCTGAGACCGCCAGTCCGAAAGAGTTCAAGACAGGAGGAAG 10 0 0 
TGGCAGAGGTTT CAAGAAACTT AAAAG C C C T T G C AAAGGAAC T T CAC AT T 

CCCGTTATGGCACTTGCGCAGCTCTCCCGTGAGGTGGAAAAGAGGAGTGA 1100 
TAAAAGAC C C CAGCTTG CGGACCT C AG AGAAT C C GGAC AGAT AGAACAGG 

ACGCAGACCTAATCCTTTTCCTCCACAGACCCGAGTACTACAAGAAAAAG 12 00 
C C AAAT C C CGAAGAG CAGGGTAT AG CGGAAG T GAT AAT AGC C AAGC AAAG 

G C AAGGAC C C ACGGACATTGTGAAG C T C G CAT T T ATT AAGGAGT AC AC T A 13 00 
AGTTTGCAAACCTAGAAGCCCTTCCTGAACAACCTCCTGAAGAAGAGGAA 

CTTTCCGAAATTATTGAAACACAGGAGGATGAAGGATTCGAAGATATTGA 14 0 0 
CTTCTGAAAATTAAGGTTTTATAATTTTATCTTGGCTATCCGGGGTAGCT 

CAATCGGCAGAGCGGGTGGCTG 14 72 
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MQFVDKLPCDESAERAVLGSMLEDPENIPLVLEYLKEEDFCIDEHKLLFR 

VLTNLWSE YGNKLDFVL I KDHLE KKNLLQK I P I DWLEELYEEAVS PDTLE 10 0 
EVCKIVKQRSAQRAI IQLGITSTQFYHVKDVAEEVIELIYKFKSSDRLVT 

GLPSGFTELDLKTTGFHPGDLI ILAARPGMGKTAFMLSI IYNLAKDEGKP 2 0 0 
SAVFSLEMSKEQLVMRLLSMMSEVPLFKIRSGSISNEDLKKLEASAIELA 

KYDIYLDDTPALTTTDLRIRARKLRKEKEVEFVAVDYLQLLRPPVRKSSR 3 0 0 
QEEVAE VSRNLKALAKE LH I P VMALAQL S REVE KRSDKRPQLADLRE S GQ 

IEQDADLILFLHRPEYYKKKPNPEEQGIAEVI I AKQRQGPTD I VKLAF I K 4 0 0 
EYTKFANLEALPEQPPEEEELSE I I ETQEDEGFED IDF 
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ATGT C CT CGGAC ATAGACGAAC T T AGAC GGGAAAT AGAT AT AGT AGACGT 

CATTTCCGAATACTTAAACTTAGAGAAGGTAGGTTCCAATTACAGAACGA 10 0 

ACTGTCCCTTTCACCCTGACGATACACCCTCCTTTTACGTGTCTCCAAGT 

AAACAAATATTCAAGTGTTTCGGTTGCGGGGTAGGGGGAGACGCGATAAA 2 0 0 

GTTCGTTTCCCTTTACGAGGACATCTCCTATTTTGAAGCCGCCCTTGAAC 

T CG CAAAACGCT ACGGAAAGAAAT T AGAC C T T GAAAAGAT AT C AAAAGAC 3 0 0 

GAAAAGGTATACGTGGCTCTTGACAGGGTTTGTGATTTCTACAGGGAAAG 

CCTTCTCAAAAACAGAGAGGCAAGTGAGTACGTAAAGAGTAGGGGAATAG 4 0 0 

ACCCTAAAGTAGCGAGGAAGTTTGATCTTGGGTACGCACCTTCCAGTGAA 

GCACTCGTAAAAGTCTTAAAAGAGAACGATCTTTTAGAGGCTTACCTTGA 5 0 0 

AACTAAAAACCTCCTTTCTCCTACGAAGGGTGTTTACAGGGATCTCTTTC 

TTCGGCGTGTCGTGATCCCGATAAAGGATCCGAGGGGAAGAGTTATAGGT 6 0 0 

TT CGGTGGAAGGAGGAT AGTAGAGGAC AAAT C T C C C AAGT AC AT AAACT C 

TCCAGACAGCAGGGTATTTAAAAAGGGGGAGAACTTATTCGGTCTTTACG 7 0 0 

AGGCAAAGGAGTATATAAAGGAAGAAGGATTTGCGATACTTGTGGAAGGG 

TACTTTGACCTTTTGAGACTTTTTTCCGAGGGAATAAGGAACGTTGTTGC 8 0 0 

ACCCCTCGGTACAGCCCTGACCCAAAATCAGGCAAACCTCCTTTCCAAGT 

TCACAAAAAAGGTCTACATCCTTTACGACGGAGATGATGCGGGAAGAAAG 9 0 0 

GCTATGAAAAGTGCCATTCCCCTACTCCTCAGTGCAGGAGTGGAAGTTTA 

TCCCGTTTACCTCCCCGAAGGATACGATCCCGACGAGTTTATAAAGGAAT 10 00 

TCGGGAAAGAGGAATTAAGAAGACTGATAAACAGCTCAGGGGAGCTCTTT 

GAAACGCTCATAAAAACCGCAAGGGAAAACTTAGAGGAGAAAACGCGTGA 110 0 

GTTCAGGTATTATCTGGGCTTTATTTCCGATGGAGTAAGGCGCTTTGCTC 

TGGCTTCGGAGTTTCACACCAAGTACAAAGTTCCTATGGAAATTTTATTA 12 0 0 

ATGAAAATTGAAAAAAATTCTCAAGAAAAAGAAATTAAACTCTCCTTTAA 

GGAAAAAAT C T T C C TGAAAGGAC TG AT AGAAT T AAAAC C AAAAAT AGAC C 13 0 0 

TTGAAGTCCTGAACTTAAGTCCTGAGTTAAAGGAACTCGCAGTTAACGCC 

TTAAACGGAGAGGAGCATTTACTTCCAAAAGAAGTTCTCGAGTACCAGGT 14 0 0 

GGATAACTTGGAGAAACTTTTTAACAACATCCTTAGGGATTTACAAAAAT 

CTGGGAAAAAGAGGAAGAAAAGAGGGTTGAAAAATGTAAATACTTAATTA 15 0 0 

ACTTTAATAAATTTTTAGAGTTAGGA 
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MSSDIDELRREIDIVDVISEYLNLEKVGSNYRTNCPFHPDDTPSFYVSPS 

KQIFKCFGCGVGGDAIKFVSLYEDISYFEAALELAKRYGKKLDLEKISKD 10 0 
EKVYVALDRVCDFYRESLLKNREASEYVKSRGIDPKVARKFDLGYAPSSE 

ALVKVLKENDLLEAYLETKNLLSPTKGVYRDLFLRRWIPIKDPRGRVIG 2 0 0 
FGGRR I VEDKS PKY I NS PDSRVFKKGENLFGLYEAKEY I KEEGFAI LVEG 

YFDLLRLFSEG I RNWAPLGTALTQNQANLLS KFTKKVY I LYDGDDAGRK 3 0 0 
AMKSAIPLLLSAGVEVYPVYLPEGYDPDEFIKEFGKEELRRLINSSGELF 

ETLIKTARENLEEKTREFRYYLGFISDGVRRFALASEFHTKYKVPMEILL 4 0 0 
MKIEKNSQEKEIKLSFKEKIFLKGLIELKPKIDLEVLNLSPELKELAVNA 

LNGEEHLLPKEVLEYQVDNLEKLFNNILRDLQKSGKKRKKRGLKNVNT 4 9 8 
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ATGCAAGATACCGCTACCTGCAGTATTTGTCAGGGGACGGGATTCGTAAA 

GACCGAAGACAACAAGGTAAGGCTCTGCGAATGCAGGTTCAAGAAAAGGG 100 

ATGTAAACAGGGAACTAAACATCCCAAAGAGGTACTGGAACGCCAACTTA 

GACACTTACCACCCCAAGAACGTATCCCAGAACAGGGCACTTTTGACGAT 2 00 

AAGGGTCTTCGTCCACAACTTCAATCCCGAGGAAGGGAAAGGGCTTACCT 

TTGTAGGATCTCCTGGAGTCGGCAAAACTCACCTTGCGGTTGCAACATTA 3 0 0 

AAAGCGATTTATGAGAAGAAGGGAATCAGAGGATACTTCTTCGATACGAA 

GGATCTAATATTCAGGTTAAAACACTTAATGGACGAGGGAAAGGATACAA 4 00 

AGTTTTTAAAAACTGTCTTAAACT CACCGGTTTTGGTT CT CGACGACCTC 

GGTTCTGAGAGGCTCAGTGACTGGCAGAGGGAACTCATCTCTTACATAAT 50 0 

CACTTACAGGTATAACAACCTTAAGAGCACGATAATAACCACGAATTACT 

CACTCCAGAGGGAAGAAGAGAGTAGCGTGAGGATAAGTGCGGATCTTGCA 6 0 0 

AGCAGACTCGGAGAAAACGTAGTTTCAAAAATTTACGAGATGAACGAGTT 

G CTCGTTATAAAGGGTT C CGAC C T C AGGAAGT C TAAAAAG C TAT C AAC C C 7 0 0 

CATCT 
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MQDTATCS I CQGTGFVKTEDNKVRLCECRFKKRDVNRELNI PKRYWNANL 
DTYHPKNVSQNRALLTIRVFVHNFNPEEGKGLTFVGSPGVGKTHLAVATL 100 
KAIYEKKGIRGYFFDTKDLIFRLKHLMDEGKDTKFLKTVLNSPVLVLDDL 
GSERLSDWQRELISYI ITYRYNNLKSTI ITTNYSLQREEESSVRISADLA 200 
S RLGENWS K I YEMNE LL V IKGSDLRKSKKLSTPS 
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ATGAAAAAGATTGAAAATTTGAAGTGGAAAAATGTCTCGTTTAAAAGCCT 

GGAAATAGATCCCGATGCAGGTGTGGTTCTCGTTTCCGTGGAAAAATTCT 10 0 
CCGAAGAGATAGAAGACCTTGTGCGTTTACTGGAGAAGAAGACGCGGTTT 

CGAGTCATCGTGAACGGTGTTCAAAAAAGTAACGGGGATCTAAGGGGAAA 2 0 0 
GATACTTTCCCTTCTCAACGGTAATGTGCCTTACATAAAAGATGTTGTTT 

TCGAAGGAAACAGGCTGATTCTGAAAGTGCTTGGAGATTTCGCGCGGGAC 3 0 0 
AGGATCGCCTCCAAACTCAGAAGCACGAAAAAACAGCTCGATGAACTGCT 

GCCTCCCGGAACAGAGATCATGCTGGAGGTTGTGGAGCCTCCGGAAGATC 4 0 0 
TTTTGAAAAAGGAAGTACCACAACCAGAAAAGAGAGAAGAACCAAAGGGT 

GAAGAATTGAAGATCGAGGATGAAAACCACATCTTTGGACAGAAACCCAG 50 0 
AAAG AT CGTCTTCACCCCCT C AAAAAT C T T T GAGT AC AAC AAAAAG AC AT 

CGGTGAAGGGCAAGAT C T T CAAAATAGAGAAGAT CGAGGGGAAAAGAACG 600 
GTCCTTCTGATTTACCTGACAGACGGAGAAGATTCTCTGATCTGCAAAGT 

CTTCAACGACGTTGAAAAGGTCGAAGGGAAAGTATCGGTGGGAGACGTGA 70 0 
TCGTTGCCACAGGAGACCTCCTTCTCGAAAACGGGGAGCCCACCCTTTAC 

GTGAAGGGAAT C AC AAAAC TTC C CGAAG C GAAAAGGATGGACAAAT CT C C 8 0 0 
GGTTAAGAGGGTGGAGCTC CACG C C CATAC CAAGTTCAGCGATCAGGACG 

CAATAACAGATGTGAACGAATATGTGAAACGAGCCAAGGAATGGGGCTTT 9 0 0 
CCCGCGATAGCCCTCACGGATCATGGGAACGTTCAGGCCATACCTTACTT 

CTACGACGCGGCGAAAGAAGCTGGAATAAAGCCCATTTTCGGTATCGAAG 10 0 0 
CGTATCTGGTGAGTGACGTGGAGCCCGTCATAAGGAATCTCTCCGACGAT 

TCGACGTTTGGAGATGCCACGTTCGTCGTCCTCGACTTCGAGACGACGGG 110 0 
TCTCGACCCGCAGGTGGATGAGATCATCGAGATAGGAGCGGTGAAGATAC 

AGGGTGGCCAGATAGTGGACGAGTACCACACTCTCATAAAGCCTTCCAGG 12 0 0 
GAGAT CT C AAGAAAAAGT T C GG AG AT C A C C G G AAT C AC T C AAG AGATG C T 

GGAAAACAAGAGAAGCATCGAGGAAGTTCTGCCGGAGTTCCTCGGTTTTC 13 0 0 
TGGAAGATTCCATCATCGTAGCACACAACGCCAACTTCGACTACAGATTT 

CTGAGGCTGTGGATCAAAAAAGTGATGGGATTGGACTGGGAAAGACCCTA 14 0 0 
CATAGATACGCTCGCCCTCGCAAAGTCCCTTCTCAAACTGAGAAGCTACT 

CTCTGGATTCCGTTGTGGAAAAGCTCGGATTGGGTCCCTTCCGGCACCAC 15 0 0 
AGGGCCCTGGATGACGCGAGGGTCACCGCTCAGGTTTTCCTCAGGTTCGT 

TGAGATGATGAAGAAGATCGGTATCACGAAGCTTTCAGAAATGGAGAAGT 16 0 0 
TGAAGGATACGATAGACTACACCGCGTTGAAACCCTTCCACTGCACGATC 

CTCGTTCAGAACAAAAAGGGATTGAAAAACCTATACAAACTGGTTTCTGA 17 0 0 
TTCCTATATAAAGTACTTCTACGGTGTTCCGAGGATCCTCAAAAGTGAGC 

TCATCGAGAACAGAGAAGGACTGCTCGTGGGTAGCGCGTGTATCTCCGGT 18 0 0 
GAGCTCGGACGTGCCGCCCTCGAAGGAGCGAGTGATTCAGAACTCGAAGA 

GATCGCGAAGTTCTACGACTACATAGAAGTCATGCCGCTCGACGTTATAG 190 0 
CCGAAGATGAAGAAGACCTAGACAGAGAAAGACTGAAAGAAGTGTACCGA 

AAACTCTACAGAATAGCGAAAAAATTGAACAAGTTCGTCGTCATGACCGG 2 0 0 0 
TGATGTTCATTTCCTCGATCCCGAAGATGCCAGGGGCAGAGCTGCACTTC 

TGGCACCTCAGGGAAACAGAAACTTCGAGAATCAGCCCGCACTCTACCTC 210 0 
AGAACGAC CGAAGAAAT GC T C GAGAAGG C GAT AGAGATATTCGAAGATGA 

AGAGAT CGCGAGGGAAGT C GT GATAGAGAAT C C C AAC AGAATAGC CGAT A 2 2 0 0 
TGATCGAGGAAGTGCAGCCGCTCGAGAAAAAACTTCACCCGCCGATCATA 

GAGAACGCCGATGAAATAGTGAGAAACCTCACCATGAAGCGGGCGTACGA 2 3 0 0 
GATCTACGGTGATCCGCTTCCCGAAATCGTCCAGAAGCGTGTGGAAAAGG 
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AACTGAACGCCATCATAAATCATGGATACGCCGTTCTCTATCTCATCGCT 24 0 0 

CAGGAGCTCGTTCAGAAATCTATGAGCGATGGTTACGTGGTTGGATCCAG 

AGGATCCGTCGGGTCTTCACTCGTGGCCAATCTCCTCGGAATAACAGAGG 2 5 0 0 

TGAATCCCCTACCACCACATTACAGGTGTCCAGAGTGCAAATACTTTGAA 

GTTGTCGAAGACGACAGATACGGAGCGGGTTACGACCTTCCCAACAAGAA 2 6 0 0 

CTGTCCAAGATGTGGGGCTCCTCTCAGAAAAGACGGCCACGGCATACCGT 

TTGAAACGTTCATGGGGTTCGAGGGTGACAAGGTCCCCGACATAGATCTC 2 70 0 

AACTTCTCAGGAGAGTATCAGGAACGTGCTCATCGTTTTGTGGAAGAACT 

C T T CGGT AAAGAC CAC G T C T AT AGGGC GGGAAC CATAAACAC CAT CG CGG 2800 

AAAGAAGTGCGGTGGGTTACGTGAGAAGCTACGAAGAGAAAACCGGAAAG 

AAGCTCAGAAAGGCGGAAATGGAAAGACTCGTTTCCATGATCACGGGAGT 2 9 0 0 

GAAGAGAACGACGGGT C AG CAC C CAGGGGGGCT CATGATCATACCGAAAG 

AC AAAGAAGT CT ACGAT T T CAC T C C C ATACAGTAT C CAGC CAACGATAGA 3 00 0 

AACGCAGGTGTGTTCACCACGCACTTCGCATACGAGACGATCCATGATGA 

CCTGGTGAAGATAGATGCGCTCGGCCACGATGATCCCACTTTCATCAAGA 310 0 

TGCTCAAGGACCTCACCGGAATCGATCCCATGACGATTCCCATGGATGAC 

CCCGATACGCTCGCCATATTCAGTTCTGTGAAGCCTCTTGGTGTGGATCC 32 0 0 

CGTTGAGCTGGAAAGCGATGTGGGAACGTACGGAATTCCGGAGTTCGGAA 

CCGAGTTTGTGAGGGGAATGCTCGTTGAAACGAGACCAAAGAGTTTCGCC 3 3 0 0 

GAGCTTGTGAGAATCTCAGGACTGTCACACGGTACGGACGTCTGGTTGAA 

CAACGCACGTGATTGGATAAACCTCGGCTACGCCAAGCTCTCCGAGGTTA 34 0 0 

TCTCGTGTAGGGACGACATCATGAACTTCCTCATACACAAAGGAATGGAA 

CCGTCACTTGCCTTCAAGATCATGGAAAACGTCAGGAAGGGAAAGGGTAT 3 5 0 0 

CACAGAAGAGATGGAGAGCGAGATGAGAAGGCTGAAGGTTCCAGAATGGT 

TCATCGAATCCTGTAAAAGGATCAAATATCTCTTCCCGAAAGCTCACGCT 3 6 0 0 

GTGGCTTACGTGAGTATGGCCTTCAGAATTGCTTACTTCAAGGTTCACTA 

TCCTCTTCAGTTTTACGCGGCGTACTTCACGATAAAAGGTGATCAGTTCG 3 7 0 0 

ATCCGGTTCTCGTACTCAGGGGAAAAGAAGCCATAAAGAGGCGCTTGAGA 

GAACTCAAAGCGATGCCTGCCAAAGACGCCCAGAAGAAAAACGAAGTGAG 3 8 0 0 

TGTTCTGGAGGTTGCCCTGGAAATGATACTGAGAGGTTTTTCCTTCCTAC 

CGCCCGACATCTTCAAATCCGACGCGAAGAAATTTCTGATAGAAGGAAAC 3 9 0 0 

TCGCTGAGAATTCCGTTCAACAAACTTCCAGGACTGGGTGACAGCGTTGC 

CGAGTCGATAATCAGAGCCAGGGAAGAAAAGCCGTTCACTTCGGTGGAAG 4 0 0 0 

ATCTCATGAAGAGGACCAAGGTCAACAAAAATCACATAGAGCTGATGAAA 

AGCCTGGGTGTTCTCGGGGACCTTCCAGAGACGGAACAGTTCACGCTTTT 410 0 

C 
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MKKIENLKWKWSFKSLEIDPDAGWLVSVEKFSEEIEDLVRLLEKKTRF 

RVIVNGVQKSNGDLRGKILSLLNGNVPYIKDWFEGNRLILKVLGDFARD 100 

RIASKLRSTKKQLDELLPPGTEIMLEWEPPEDLLKKEVPQPEKREEPKG 

EELKIEDENHIFGQKPRKIVFTPSKIFEYNKKTSVKGKIFKIEKIEGKRT 200 

VLLIYLTDGEDSLICKVFNDVEKVEGKVSVGDVIVATGDLLLENGEPTLY 

VKGITKLPEAKRMDKSPVKRVELHAHTKFSDQDAITDVNEYVKRAKEWGF 3 0 0 

PAI ALTDHGNVQAI PYFYDAAKEAG I KP I FG I EAYLVSDVEPVIRNLSDD 

STFGDATFWLDFETTGLDPQVDEIIEIGAVKIQGGQIVDEYHTLIKPSR 400 

EISRKSSEITGITQEMLENKRSIEEVLPEFLGFLEDSI IVAHNANFDYRF 

LRLWIKKVMGLDWERPYIDTLALAKSLLKLRSYSLDSWEKLGLGPFRHH 50 0 

RALDDARVTAQVFLRFVEMMKKIGITKLSEMEKLKDTIDYTALKPFHCTI 

LVQNKKGLKNLYKLVSDSYI KYFYGVPRI LKSEL I ENREGLLVGSACI SG 6 0 0 

ELGRAALEGASDSELEEIAKFYDYIEVMPLDVIAEDEEDLDRERLKEVYR 

KL YR I AKKLNKF WMTGD VH FL D P E D ARGRAAL L AP Q GNRNF ENQ P AL YL 70 0 

RTTEEMLEKAIEIFEDEEIAREWIENPNRIADMIEEVQPLEKKLHPPII 

ENADE I VRNLTMKRAYE I YGD PL PE I VQ KR VE KE LNA 1 1 NHGYAVL YL I A 8 0 0 

QELVQKSMSDGYWGSRGSVGSSLVANLLGITEVNPLPPHYRCPECKYFE 

WEDDRYGAGYDLPNKNCPRCGAPLRKDGHGIPFETFMGFEGDKVPDIDL 900 

NFSGEYQERAHRFVEELFGKDHVYRAGTINTIAERSAVGYVRSYEEKTGK 

KLRKAEMERLVSMITGVKRTTGQHPGGLMI IPKDKEVYDFTPIQYPANDR 10 0 0 

NAGVFTTHFAYETIHDDLVKIDALGHDDPTFIKMLKDLTGIDPMTIPMDD 

PDTLAIFSSVKPLGVDPVELESDVGTYGIPEFGTEFVRGMLVETRPKSFA 110 0 

ELVRISGLSHGTDVWLNNARDWINLGYAKLSEVISCRDDIMNFLIHKGME 

P S LAFKI MENVRKGKG ITEEME SEMRRL KVPEWFIESCKRI KYLF PKAHA 1200 

VAYVSMAFRIAYFKVHYPLQFYAAYFTIKGDQFDPVLVLRGKEAIKRRLR 

ELKAMPAKDAQKKNEVSVLEVALEMILRGFSFLPPDIFKSDAKKFLIEGN 13 0 0 

SLRIPFNKLPGLGDSVAESI IRAREEKPFTSVEDLMKRTKVNKNHIELMK 

SLGVLGDLPETEQFTLF 13 67 
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GTGCTCGCCATGATATGGAACGACACCGTTTTTTGCGTCGTAGACACAGA 

AACCACGGGAACCGATCCCTTTGCCGGAGACCGGATAGTTGAAATAGCCG 10 0 
CTGTTCCTGTCTTCAAGGGGAAGATCTACAGAAACAAAGCGTTTCACTCT 

CTCGTGAATCCCAGAATAAGAATCCCTGCGCTGATTCAGAAAGTTCACGG 2 0 0 
TATCAGCAACATGGACATCGTGGAAGCGCCAGACATGGACACAGTTTACG 

ATCTTTTCAGGGATTACGTGAAGGGAACGGTGCTCGTGTTTCACAACGCC 3 0 0 
AACTTCGACCTCACTTTTCTGGATATGATGGCAAAGGAAACGGGAAACTT 

TCCAATAACGAATCCCTACATCGACACACTCGATCTTTCAGAAGAGATCT 4 0 0 
TTGGAAGGCCTCATTCTCTCAAATGGCTCTCCGAAAGACTTGGAATAAAA 

ACCACGATACGGCACCGTGCTCTTCCAGATGCCCTGGTGACCGCAAGAGT 50 0 
TTTTGTGAAGCTTGTTGAATTTCTTGGTGAAAACAGGGTCAACGAATTCA 

TACGTGGAAAACGGGGG 56 7 
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MLAMIWNDTVFCWDTETTGTDPFAGDRIVEIAAVPVFKGKIYRNKAFHS 

LVNPRIRI PALIQKVHGI SNMD I VEAPDMDTVYDLFRDYVKGTVLVFHNA 10 0 

NFDLTFLDMMAKETGNFPITNPYIDTLDLSEEIFGRPHSLKWLSERLGIK 

TT I RHRAL PD AL VTAR VF VKL VE F L GE NRVNE F I RGKRG 18 9 
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GTGGAAGTTCTTTACAGGAAGTACAGGCCAAAGACTTTTTCTGAGGTTGT 
CAATCAGGATCATGTGAAGAAGGCAATAATCGGTGCTATTCAGAAGAACA 100 
GCGTGGCCCACGGATACATATTCGGCGGTCCGAGGGGAACGGGGAAGACT 
ACTCTTGCCAGAATTCTCGCAAAATCCCTGAACTGTGAGAACAGAAAGGG 2 00 
AGTTGAAC C CTGCAATT C C TG C AG AG C C T G C AGAGAGATAGACGAGGGAA 
CCTTCATGGACGTGATAGAGCTCGACGCGGCCTCCAACAGAGGAATAGAC 3 00 
GAGATCAGAAGAATCAGAGACGCCGTTGGATACAGGCCGATGGAAGGTAA 
ATACAAAGTCTACATAATAGACGAAGTTCACATGCTCACGAAAGAAGCCT 4 00 
TCAACGCGCTCCTCAAAACACTCGAAGAACCTCCTTCCCACGTCGTGTTC 
GTGCTGGCAACGACAAACCTTGAGAAGGTTCCTCCCACGATTATCTCGAG 500 
ATGTCAGGTTTTCGAGTTCAGAAACATT C C CGACGAGCTCATCGAAAAGA 
GGCTCCAGGAAGTTGCGGAGGCTGAAGGAATAGAGATAGACAGGGAAGCT 6 0 0 
CTGAGCTTCATCGCAAAAAGAGCCTCTGGAGGCTTGAGAGACGCGCTCAC 
CATGCTCGAGCAGGTGTGGAAGTTCTCGGAAGGAAAGATAGATCTCGAGA 7 0 0 
CGGTACACAGGGCGCTCGGGTTGATACCGATACAGGTTGTTCGCGATTAC 
GTGAACGCTATCTTTTCTGGTGATGTGAAAAGGGTCTTCACCGTTCTCGA 8 0 0 
CGACGTCTATTACAGCGGGAAGGACTACGAGGTGCTCATTCAGGAAGCAG 
TCGAGGATCTGGTCGAAGACCTGGAAAGGGAGAGAGGGGTTTACCAGGTT 9 0 0 
TCAGCGAACGATATAGTTCAGGTTTCGAGACAACTTCTGAATCTTCTGAG 
AGAGATAAAGTTCGCCGAAGAAAAACGACTCGTCTGTAAAGTGGGTTCGG 10 0 0 
CTTACATAGCGACGAGGTTCTCCACCACAAACGTTCAGGAAAACGATGTC 
AGAGAAAAAAACGATAATTCAAATGTACAGCAGAAAGAAGAGAAGAAAGA 1100 
AACGGTGAAGGCAAAAGAAGAAAAAC AGGAAGACAG CGAGTT CGAGAAAC 
G C T T C AAAGAACT CATGGAAGAAC T G AAAGAAAAGGG CGAT CT C T C TAT C 12 0 0 
TTTGTCGCTCTCAGCCTCTCAGAGGTGCAGTTTGACGGAGAAAAGGTGAT 
TATTTCTTTTGATTCATCGAAAGCTATGCATTACGAGTTGATGAAGAAAA 13 00 
AACTGCCTGAGCTGGAAAACATTTTTTCTAGAAAACTCGGGAAAAAAGTA 
GAAGTTGAACTTCGACTGATGGGAAAAGAAGAAACAATCGAGAAGGTTTC 14 0 0 
T CAGAAGAT C CTGAGATTGTTTGAAC AGGAGGGA 
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MEVLYRKYRPKTFSE WNQDHVKKAI I GAI QKNS VAHGY I FAGPRGTGKT 
TLAR I LAKSLNCENRKGVE P CNS CRACRE I DE GT FMDV I ELDAASNRG I D 
EIRRIRDAVGYRPMEGKYKVYI IDEVHMLTKEAFNALLKTLEEPPSHWF 
VLATTNLEKVPPTI ISRCQVFEFRNIPDELIEKRLQEVAEAEGIEIDREA 
L S F I AKRAS GGLRDALTML E Q VWKF SEGKIDLE T VHRALGL I P I QWRDY 
VNAIFSGDVKRVFTVLDDVYYSGKDYEVLIQEAVEDLVEDLERERGVYQV 
SAND I VQ VSRQLLNLLRE I KFAE E KRLVCKVG SAY IATRF S TTNVQENDV 
REKNDNSNVQQKEEKKETVKAKEEKQEDSEFEKRFKELMEELKEKGDLSI 
FVALSLSEVQFDGEKVI ISFDSSKAMHYELMKKKLPELENIFSRKLGKKV 
EVELRLMGKEETIEKVSQKILRLFEQEG 
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ATGAAAGTAACCGTCACGACTCTTGAATTGAAAGACAAAATAACCATCGC 

CTCAAAAGCGCTCGCAAAGAAATCCGTGAAACCCATTCTTGCTGGATTTC 100 
TTTTCGAAGTGAAAGATGGAAATTTCTACATCTGCGCGACCGATCTCGAG 

ACCGGAGTCAAAGCAACCGTGAATGCCGCTGAAATCTCCGGTGAGGCACG 2 00 
TTTTGTGGTACCAGGAGATGTCATTCAGAAGATGGTCAAGGTTCTCCCAG 

ATGAGATAACGGAACTTTCTTTAGAGGGGGATGCTCTTGTTATAAGTTCT 3 0 0 
GGAAGCACCGTTTTCAGGATCACCACCATGCCCGCGGACGAATTTCCAGA 

GATAACGCCTGCCGAGTCTGGAATAACCTTCGAAGTTGACACTTCGCTCC 4 0 0 
TCGAGGAAATGGTTGAAAAGGTCATCTTCGCCGCTGCCAAAGACGAGTTC 

ATGCGAAATCTGAATGGAGTTTTCTGGGAACTCCACAAGAATCTTCTCAG 5 0 0 
GCTGGTTGCAAGTGATGGTTTCAGACTTGCACTTGCTGAAGAGCAGATAG 

AAAACGAGGAAGAGG CGAGT TTCTTGCTCTCT T TGAAGAG C ATGAAAGAA 6 0 0 
GTTCAAAACGTGCTGGACAACACAACGGAGCCGACTATAACGGTGAGGTA 

CGATGGAAGAAGGGTTTCTCTGTCGACAAATGATGTAGAAACGGTGATGA 7 0 0 
GAGTGGTCGACGCTGAATTTC C CGATTACAAAAGGGTGAT C C C CGAAACT 

TTCAAAACGAAAGTGGTGGTTTCCAGAAAAGAACTCAGGGAATCTTTGAA 8 0 0 
GAGGGTGATGGTGATTGCCAGCAAGGGAAGCGAGTCCGTGAAGTTCGAAA 

TAGAAGAAAACGTTATGAGACTTGTGAGCAAGAGCCCGGATTATGGAGAA 9 0 0 
GTGGTCGATGAAGTTGAAGTTCAAAAAGAAGGGGAAGATCTCGTGATCGC 

TTTCAACCCGAAGTTCATCGAGGACGTTTTGAAGCACATTGAGACTGAAG 10 0 0 
AAATCGAAATGAACTTCGTTGATTCTACCAGTCCATGTCAGATAAATCCA 

CTCGATATTTCTGGATACCTTTACATAGTGATGCCCATCAGACTGGCA 10 98 



FIG. 60 



MKVTVTTLELKDKITIASKALAKKSVKPILAGFLFEVKDGNFYICATDLE 
TGVKATVNAAE I SGEARFWPGDVI QKMVKVLPDE I TELSLEGDALVI SS 10 0 
GSTVFRITTMPADEFPEITPAESGITFEVDTSLLEEMVEKVIFAAAKDEF 
MRNLNGVFWELHKNLLRLVASDGFRLALAEEQIENEEEASFLLSLKSMKE 200 
VQNVLDNTTE PT I TVRYDGRRVS L S TNDVE TVMRWDAE FPDYKRV I PET 
FKTKVWSRKELRESLKRVMVIASKGSESVKFEIEENVMRLVSKSPDYGE 3 00 
WDEVEVQKEGEDLVIAFNPKFIEDVLKHIETEEIEMNFVDSTSPCQINP 
LDISGYLYIVMPIRLA 3 66 
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ATGCCAGTCACGTTT CT CACAGGTACTGCAGAAACTCAGAAGGAAGAATT 

GATAAAGAAACT C C T GAAGGAT GGT AACGTGGAGTAC AT AAGGAT C CAT C 10 0 
CGGAGGATCCCGACAAGATCGATTTCATAAGGTCTTTACTCAGGACAAAG 

ACGATCTTTTCCAACAAGACGATCATTGACATCGTCAATTTCGATGAGTG 2 0 0 
GAAAGCACAGGAGCAGAAGCGTCTCGTTGAACTTTTGAAAAACGTACCGG 

AAGACGTT CAT AT C T T CAT C C GT T CT C AAAAAAC AGGTGGAAAGGGAGT A 3 0 0 
GCGCTGGAGCTTCCGAAGCCATGGGAAACGGACAAGTGGCTTGAGTGGAT 

AGAAAAGCGCTTCAGGGAGAATGGTTTGCTCATCGATAAAGATGCCCTTC 4 0 0 
AGCTGTTTTTCTCCAAGGTTGGAACGAACGACCTGATCATAGAAAGGGAG 

ATTGAAAAACTGAAAGC T TAT T C CGAGGACAGAAAGATAACGGTAGAAGA 50 0 
CGTGGAAGAGGTCGTTTTTACCTATCAGACTCCGGGATACGATGATTTTT 

GCTTTGCTGTTTCCGAAGGAAAAAGGAAGCTCGCTCACTCTCTTCTGTCG 6 0 0 
CAGCTGTGGAAAACCACAGAGTCCGTGGTGATTGCCACTGTCCTTGCGAA 

TCACTTCTTGGATCTCTTCAAAATCCTCGTTCTTGTGACAAAGAAAAGAT 70 0 
ACTACACCTGGCCTGATGTGTCCAGGGTGTCCAAAGAGCTGGGAATTCCC 

GTTCCTCGTGTGGCTCGTTTCCTCGGTTTCTCCTTTAAGACCTGGAAATT 8 0 0 
CAAGGTGATGAACCACCTCCTCTACTACGATGTGAAGAAGGTTAGAAAGA 

TACTGAGGGATCTCTACGATCTGGACAGAGCCGTGAAAAGCGAAGAAGAT 90 0 
CCAAAACCGTTCTTCCACGAGTTCATAGAAGAGGTGGCACTGGATGTATA 

TTCTCTTCAGAGAGATGAAGAA 9 72 



FIG. 62 



MPVTFLTGTAETQKEELIKKLLKDGNVEYIRIHPEDPDKIDFIRSLLRTK 
TIFSNKTIIDIVNFDEWKAQEQKRLVELLKNVPEDVHIFIRSQKTGGKGV 10 0 
ALELPKPWETDKWLEWIEKRFRENGLLIDKDALQLFFSKVGTNDLIIERE 
IEKLKAYSEDRKITVEDVEEWFTYQTPGYDDFCFAVSEGKRKLAHSLLS 2 00 
QLWKTTESWIATVLANHFLDLFKILVLVTKKRYYTWPDVSRVSKELGIP 
VPRVARFLGFSFKTWKFKVMNHLLYYDVKKVRKILRDLYDLDRAVKSEED 3 00 
PKPFFHEFIEEVALDVYSLQRDEE 



FIG. 63 



ATGAACGATTTGATCAGAAAGTACGCTAAAGATCAACTGGAAACTTTGAA 

AAGGAT CATAGAAAAGTCTGAAGGAAT AT C CAT C CT CAT AAATGGAGAAG 100 
ATCTCTCGTATCCGAGAGAAGTATCCCTTGAACTTCCCGAGTACGTGGAG 

AAATTTCCCCCGAAGGCCTCGGATGTTCTGGAGATAGATCCCGAGGGGGA 2 00 
GAACATAGGCATAGACGACATCAGAACGATAAAGGACTTCCTGAACTACA 

GCCCCGAGCTCTACACGAGAAAGTACGTGATAGTCCACGACTGTGAAAGA 3 00 
ATGACCCAGCAGGCGGCGAACGCGTTTCTGAAGGCCCTTGAAGAACCACC 

AGAATACGCTGTGATCGTTCTGAACACTCGCCGCTGGCATTATCTACTGC 4 0 0 
CGACGATAAAGAGCCGAGTGTTCAGAGTGGTTGTGAACGTTCCAAAGGAG 

TTCAGAGATCTCGTGAAAGAGAAAATAGGAGATCTCTGGGAGGAACTTCC 5 0 0 
ACTTCTTGAGAGAGACTTCAAAACGGCTCTCGAAGCCTACAAACTTGGTG 

CGGAAAAACTTTCTGGATTGATGGAAAGTCTCAAAGTTTTGGAGACGGAA 6 0 0 
AAACTCTTGAAAAAGGTCCTTTCAAAAGGCCTCGAAGGTTATCTCGCATG 

TAGGGAGCTCCTGGAGAGATTTTCAAAGGTGGAATCGAAGGAATTCTTTG 70 0 

TTGATCCAGAGACTGACAAGAAT CATT CT C CACGAAAACACATGGGAAAG 8 0 0 
CGTTGAAGATCAAAAAAGCGTGTCTTTCCTCGATTCAATTCTCAGGGTGA 

AGATAGCGAATCTGAACAACAAACTCACTCTGATGAACATCCTCGCGATA 9 0 0 
CAC AGAGAGAGAAAGAGAGGTGT C AAC G C T T GGAG C 
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MNDLIRKYAKDQLETLKRI IEKSEGISILINGEDLSYPREVSLELPEYVE 

KFPPKASDVLEIDPEGENIGIDDIRTIKDFLNYSPELYTRKYVIVHDCER 100 

MTQQAANAFLKALEE PPE YAVI VLNTRRWHYLL PT I KSRVFRVWNVPKE 

FRDLVKEKIGDLWEELPLLERDFKTALEAYKLGAEKLSGLMESLKVLETE 2 00 

KLLKKVLSKGLEGYLACRELLERFSKVESKEFFALFDQVTNTITGKDAFL 

L I QRLTR 1 1 LHENTWES VEDKS VS FLD S I LRVK I ANLNNKLTLMN ILAIH 3 0 0 

RERKRGVNAWS 
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ATGTCTTTCTTCAACAAGAT CATACT CATAGGAAGACTCGTGAGAGAT C C 

CGAAGAGAGATACACGCTCAGCGGAACTCCAGTCACCACCTTCACCATAG 10 0 
CGGTGGACAGGGTTCCCAGAAAGAACGCGCCGGACGACGCTCAAACGACT 

GATTTCTTCAGGATCGTCACCTTTGGAAGACTGGCAGAGTTCGCTAGAAC 2 0 0 
CTATCTCACCAAAGGAAGGCTCGTTCTCGTCGAAGGTGAAATGAGAATGA 

GAAGATGGGAAAC AC C CACTGGAGAAAAGAGGGT AT C T C CGGAGGT T GT C 3 0 0 
GCAAACGTTGTTAGATTCATGGACAGAAAACCTGCTGAAACAGTTAGCGA 

GACTGAAGAGGAGCTGGAAATACCGGAAGAAGACTTTTCCAGCGATACCT 4 0 0 
TCAGTGAAGATGAAC C AC CAT T T 



FIG. 66 



MSFFNKI ILIGRLVRDPEERYTLSGTPVTTFTIAVDRVPRKNAPDDAQTT 
DFFRIVTFGRLAEFARTYLTKGRLVLVEGEMRMRRWETPTGEKRVSPEW 10 0 
ANWRFMDRKPAETVSETEEELEIPEEDFSSDTFSEDEPPF 
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ATGCGTGTTCCCCCGCACAACTTAGAGGCCGAAGTTGCTGTGCTCGGAAG 

CATATTGATAGATCCGTCGGTAATAAACGACGTTCTTGAAATTTTGAGCC 10 0 
ACGAAGATTTCTATCTGAAAAAACACCAACACATCTTCAGAGCGATGGAA 

GAGCTTTACGACGAAGGAAAACCGGTGGACGTGGTTTCCGTCTGTGACAA 2 0 0 
GCTTCAAAGCATGGGAAAACTCGAGGAAGTAGGTGGAGATCTGGAAGTGG 

CCCAGCTCGCTGAGGCTGTGCCCAGTTCTGCACACGCACTTCACTACGCG 3 0 0 
G AGAT CGT C AAGGAAAAAT C C AT T C T G AGGAAA C T CAT T GAGAT C T C C AG 

AAAAATCTCAGAAAGTGCCTACATGGAAGAAGATGTGGAGATCCTGCTCG 4 0 0 
AC AAC G C AG AAAAGAT GAT C T T C GAGAT C T C AGAGAT GAAAAC G AC AAAA 

TCCTACGATCATCTGAGAGGCATCATGCACCGGGTGTTTGAAAACCTGGA 50 0 
GAAC T T C AGGGAAAGAGC C AAC C T T AT AGAAC C CGGTGTGCT CATAAC GG 

GACTACCAACGGGATTCAAAAGTCTGGACAAACAGACCACAGGGTTCCAC 6 0 0 
AGCTCCGATCTGGTGATAATAGCAGCGAGACCCTCCATGGGAAAAACCTC 

CTTCGCACTCTCAATAGCGAGGAACATGGCTGTCAATTTCGAAATCCCCG 700 
TCGGAATATTCAGTCTCGAGATGTCCAAGGAACAGCTCGCTCAAAGACTA 

CTCAGCATGGAGTCCGGTGTGGATCTTTACAGCATCAGAACAGGATACCT 8 0 0 
GGATCAGGAGAAGTGGGAAAGACTCACAATAGCGGCTTCTAAACTCTACA 

AAGCACCCATAGTTGTGGACGATGAGTCACTCCTCGATCCGCGATCGTTG 90 0 
AGGGC AAAAGC GAGAAGGATG AAAAAAGAATACGATGTAAAAGC C AT T T T 

TGTCGACTATCTCCAGCTCATGCACCTGAAAGGAAGAAAAGAAAGCAGAC 10 0 0 
AGCAGGAGATATCCGAGATCTCGAGATCTCTGAAGCTCCTTGCGAGGGAA 

CTCGACATAGTGGTGATAGCGCTTTCACAGCTTTCGAGGGCCGTAGAACA 110 0 
GAGAGAAGACAAAAGACCGAGGCTGAGTGACCTCAGGGAATCCGGTGCGA 

TAGAACAGGACGCAGACAC AGT CAT C T T CAT CTACAGGGAGGAATATTAC 12 0 0 
AGGAGC AAAAAAT C CAAAGAGGAAAGCAAG C TT CACGAAC CT CACGAAGC 

TGAAATCATAATAGGTAAACAGAGAAACGGTCCCGTTGGAACGATCACTC 13 0 0 
TGATCTTCGACCCCAGAACGGTTACGTTCCATGAAGTCGATGTGGTGCAT 

TCA 1353 
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MRVP PHNLEAE VAVLGS ILIDPSVI NDVLE I L SHEDF YLKKHQH I FRAME 

ELYDEGKPVDWSVCDKLQSMGKLEEVGGDLEVAQLAEAVPSSAHALHYA 10 0 

EIVKEKSILRKLIEISRKISESAYMEEDVEILLDNAEKMIFEISEMKTTK 

S YDHLRGIMHRVFENLENFRERANL I E PGVL I TGLPTGFKSLDKQTTGFH 2 0 0 

SSDLVI IAARPSMGKTSFALSIARNMAVNFEIPVGIFSLEMSKEQLAQRL 

LSMESGVDLYS I RTGYLDQEKWERLT I AASKLYKAP I WDDESLLDPRSL 3 0 0 

RAKARRMKKEYDVKAI FVDYLQLMHLKGRKESRQQE I SE I SRSLKLLARE 

LDIWIALSQLSRAVEQREDKRPRLSDLRESGAIEQDADTVIFIYREEYY 4 00 

RSKKSKEESKLHEPHEAEI I IGKQRNGPVGTITLIFDPRTVTFHEVDWH 

S 451 
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GTGATTCCTCGAGAGGTCATCGAGGAAATAAAAGAAAAGGTTGACATCGT 

AGAGGTCATTTCCGAGTACGTGAATCTTACCCGGGTAGGTTCCTCCTACA 10 0 

GGGCTCTCTGTCCCTTTCATTCAGAAACCAATCCTTCTTTCTACGTTCAT 

CCGGGTTTGAAGATATACCATTGTTTCGGCTGCGGTGCGAGTGGAGACGT 2 0 0 

CATCAAATTTCTTCAAGAAATGGAAGGGATCAGTTTCCAGGAAGCGCTGG 

AAAGACTTGCCAAAAGAGCTGGGATTGATCTTTCTCTCTACAGAACAGAA 3 0 0 

GGGACTTCTGAATACGGAAAATACATTCGTTTGTACGAAGAAACGTGGAA 

AAGGT ACGT CAAAGAG C TGGAGAAAT CGAAAGAGGC AAAAGAC TAT T T AA 4 0 0 

AAAGCAGAGGCTTCTCTGAAGAAGATATAGCAAAGTTCGGCTTTGGGTAC 

GT C C C C AAGAGAT C CAGCAT C TCTAT AGAAGTTGCAGAAGGCATGAACAT 5 0 0 

AACACTGGAAGAACTTGTCAGATACGGTATCGCGCTGAAAAAGGGTGATC 

GATT C GT T GAT AGAT T C GAAGGAAGAAT CGT TGTT C C AATAAAGAACGAC 6 0 0 

AGTGGTCATATTGTGGCTTTTGGTGGGCGTGCTCTCGGCAACGAAGAACC 

GAAGT AT T T GAAC T C T C C AGAGAC CAGGT AT T T T T CGAAGAAGAAGAC C C 7 0 0 

GTCATCACCGAAGGCTACTTCGACGCGCTCGCATTCAGAAAGGATGGAAT 8 0 0 
ACCAACGGCGGTCGCTGTTCTTGGGGCGAGTCTTTCAAGAGAGGCGATTC 

TAAAACTTTCGGCGTATTCGAAAAACGTCATACTGTGTTTCGATAATGAC 9 0 0 
AAAGCAGG CTT CAGAG C C AC T C T C AAAT C C CT CGAGGAT C T C CT AGAC T A 

CGAATTCAACGTGCTTGTGGCAACCCCCTCTCCTTACAAAGACCCAGATG 10 0 0 
AACTCTTTCAGAAAGAAGGAGAAGGTTCATTGAAAAAGATGCTGAAAAAC 

TCGCGTTCGTTCGAATATTTTCTGGTGACGGCTGGTGAGGTCTTCTTTGA 110 0 
CAGGAACAGCCCCGCGGGTGTGAGATCCTACCTTTCTTTCCTCAAAGGTT 

GGGTCCAAAAGATGAGAAGGAAAGGATATTTGAAACACATAGAAAATCTC 12 0 0 
GTGAATGAGGTTTCATCTTCTCTCCAGATACCAGAAAACCAGATTTTGAA 

CGTCAAAGGTTTACGATGAGGGGAGAGGACTGGCTTATTTGTTTTTGAAC 
TACGAGGATTTGAGGGAAAAGATTCTGGAACTGGACTTAGAGGTACTGGA 14 0 0 
AGATAAAAACGCGAGGGAGTTTTTCAAGAGAGTCTCACTGGGAGAAGATT 
TGAACAAAGTCATAGAAAACTTCCCAAAAGAGCTGAAAGACTGGATTTTT 150 0 
GAGAC AAT AGAAAG CAT TCCTCCTC C AAAGGAT C C CGAGAAATT C CT CGG 
TGACCTCTCCGAAAAGTTGAAAATCCGACGGATAGAGAGACGTATCGCAG 16 0 0 
AAATAGATGATATGATAAAGAAAGCTTCAAACGATGAAGAAAGGCGTCTT 
CTTCTCTCTATGAAAGTGGATCTCCTCAGAAAAATAAAGAGGAGG 16 95 



FIG. 70 



MIPREVIEEIKEKVDIVEVISEYVNLTRVGSSYRALCPFHSETNPSFYVH 

PGLKIYHCFGCGASGDVIKFLQEMEGISFQEALERLAKRAGIDLSLYRTE 10 0 

GTSEYGKYIRLYEETWKRYVKELEKSKEAKDYLKSRGFSEEDIAKFGFGY 

VPKRS S I S I EVAEGMN I TLE ELVRYG I ALKKGDRFVDRFEGRI WP I KND 2 0 0 

SGHIVAFGGRALGNEEPKYLNSPETRYFSKKKTLFLFDEAKKVAKEVGFF 

VITEGYFDALAFRKDGIPTAVAVLGASLSREAILKLSAYSKNVILCFDND 3 0 0 

KAGFRATLKSLEDLLDYEFNVLVATPSPYKDPDELFQKEGEGSLKKMLKN 

S RS F E YF L VTAGE VF FDRNS PAGVRS YLS FLKGWVQKMRRKGYLKH I ENL 4 0 0 

VNEVSSSLQIPENQILNFFESDRSNTMPVHETKSSKVYDEGRGLAYLFLN 

YEDLREKILELDLEVLEDKNAREFFKRVSLGEDLNKVIENFPKELKDWIF 5 0 0 

ETIESIPPPKDPEKFLGDLSEKLKIRRIERRIAEIDDMIKKASNDEERRL 

LLSMKVDLLRKIKRR 565 
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ATGGCTCTACACCCGGCTCACCCTGGGGCAATAATCGGGCACGAGGCCGT 
TCTCGCCCTCCTTCCCCGCCTCACCGCCCAGACCCTGCTCTTCTCCGGCC 10 0 
CCGAGGGGGTGGGGCGGCGCACCGTGGCCCGCTGGTACGCCTGGGGGCTC 
AACCGCGGCTTCCCCCCGCCCTCCCTGGGGGAGCACCCGGACGTCCTCGA 2 0 0 
GGTGGGGCCCAAGGCCCGGGACCTCCGGGGCCGGGCCGAGGTGCGGCTGG 
AGGAGGTGGCGCCCCTCTTGGAGTGGTGCTCCAGCCACCCCCGGGAGCGG 3 0 0 
GTGAAGGTGGCCATCCTGGACTCGGCCCACCTCCTCACCGAGGCCGCCGC 
CAACGCCCTCCTCAAGCTCCTGGAGGAGCCCCCTTCCTACGCCCGCATCG 4 0 0 
TCCTCATCGCCCCAAGCCGCGCCACCCTCCTCCCCACCCTGGCCTCCCGG 
GCCACGGAGGTGGCATTCGCCCCCGTGCCCGAGGAGGCCCTGCGCGCCCT 50 0 
CACCCAGGACCCGGAGCTCCTCCGCTACGCCGCCGGGGCCCCGGGCCGCC 
TCCTTAGGGCCCTCCAGGACCCGGAGGGGTACCGGGCCCGCATGGCCAGG 6 0 0 
GCGCAAAGGGTCCTGAAAGCCCCGCCCCTGGAGCGCCTCGCTTTGCTTCG 
GGAGCTTTTGGCCGAGGAGGAGGGGGTCCACGCCCTCCACGCCGTCCTAA 70 0 
AGCGCCCGGAGCACCTCCTTGCCCTGGAGCGGGCGCGGGAGGCCCTGGAG 
GGGTACGTGAGCCCCGAGCTGGTCCTCGCCCGGCTGGCCTTAGACTTAGA 8 0 0 
GACA 



FIG. 72 



MALHPAHPGAIIGHEAVLALLPRLTAQTLLFSGPEGVGRRTVARWYAWGL 

NRGFPPPSLGEHPDVLEVGPKARDLRGRAEVRLEEVAPLLEWCSSHPRER 10 0 
VKVAILDSAHLLTEAAANALLKLLEEPPSYARIVLIAPSRATLLPTLASR 

ATEVAFAPVPEEALRALTQDPELLRYAAGAPGRLLRALQDPEGYRARMAR 2 0 0 
AQRVLKAPPLERLALLRELLAEEEGVHALHAVLKRPEHLLALERAREALE 

GYVSPELVLARLALDLET 268 
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ATGCTGGACCTGAGGGAGGTGGGGGAGGCGGAGTGGAAGGCCCTAAAGCC 

CCTTTTGGAAAGCGTGCCCGAGGGCGTCCCCGTCCTCCTCCTGGACCCTA 10 0 
AGCCAAGCCCCTCCCGGGCGGCCTTCTACCGGAACCGGGAAAGGCGGGAC 

TTCCCCACCCCCAAGGGGAAGGACCTGGTGCGGCACCTGGAAAACCGGGC 2 0 0 
CAAGCGCCTGGGGCTCAGGCTCCCGGGCGGGGTGGCCCAGTACCTGGCCT 

CCCTGGAGGGGGACCTCGAGGCCCTGGAGCGGGAGCTGGAGAAGCTTGCC 3 0 0 
CTCCTCTCCCCACCCCTCACCCTGGAGAAGGTGGAGAAGGTGGTGGCCCT 

GAGGCCCCCCCTCACGGGCTTTGACGTGGTGCGCTCCGTCCTGGAGAAGG 4 00 
ACCCCAAGGAGGCCCTCCTGCGCCTAGGCGGCCTCAAGGAGGAGGGGGAG 

GAGCCCCTCAGGCTCCTCGGGGCCCTCTCCTGGCAGTTCGCCCTCCTCGC 5 0 0 
C CGGGC CTT CTTCCTC CT C CGGGAAAAC C C CAGG C C CAAGGAGGAGGACC 

TCGCCCGCCTCGAGGCCCACCCCTACGCCGCCCGCCGCGCCCTGGAGGCG 6 0 0 
GCGAAGCGCCTCACGGAAGAGGCCCTCAAGGAGGCCCTGGACGCCCTCAT 

GGAGGCGGAAAAGAGGGCCAAGGGGGGGAAAGACCCGTGGCTCGCCCTGG 7 0 0 
AGGCGGCGGTCCTCCGCCTCGCCCGTTGA 
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MVIAFTGDPFLAREALLEEARLRGLSRFTEPTPEALAQALAPGLFGGGGA 

MLDLREVGEAEWKALKPLLESVPEGVPVLLLDPKPSPSRAAFYRNRERRD 10 0 

FPTPKGKDLVRHLENRAKRLGLRLPGGVAQYLASLEGDLEALERELEKLA 

LLSPPLTLEKVEKWALRPPLTGFDLVRSVLEKDPKEALLRLGGLKEEGE 200 

EPLRLLGALSWQFALLARAFFLLRENPRPKEEDLARLEAHPYAARRALEA 

AKRL TE E AL KE ALD ALME AE KRAKGGKD P WL AL E AAVL RL AR 2 92 
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ATGGCTCGAGGCCTGAACCGCGTTTTCCTCATCGGCGCCCTCGCCACCCG 
GCCGGACATGCGCTACACCCCGGCGGGGCTCGCCATTTTGGACCTGACCC 10 0 
TCGCCGGTCAGGACCTGCTTCTTTCCGATAACGGGGGGGAACCGGAGGTG 
TCCTGGTACCACCGGGTGAGGCTCTTAGGCCGCCAGGCGGAGATGTGGGG 2 0 0 
CGACCTCTTGGACCAAGGGCAGCTCGTCTTCGTGGAGGGCCGCCTGGAGT 
ACCGCCAGTGGGAAAGGGAGGGGGAGAAGCGGAGCGAGCTCCAGATCCGG 3 0 0 
GCCGACTTCCGGACCCCCTGGACGACCGGGGGAAGAAGCGGGCGGAGGAC 
AGCCGGGGCCAGCCCAGGCTCCGCGCCGCCCTGAACCAGGTCTTCCTCAT 4 0 0 
GGGCAACCTGACCCGGGACCCGGAACTCCGCTACACCCCCCAGGGCACCG 
CGGTGGCCCGGCTGGGCCTGGCGGTGAACGAGCGCCGCCAGGGGGCGGAG 5 0 0 
GAGCGCACCCACTTCGTGGAGGTTCAGGCCTGGCGCGACCTGGCGGAGTG 
GGCCGCCGAGCTGAGGAAGGGCGACGGCCTTTTCGTGATCGGCAGGTTGG S 0 0 
TGAACGACTCCTGGACCAGCTCCAGCGGCGAGCGGCGCTTCCAGACCCGT 
GTGGAGGCCCTCAGGCTGGAGCGCCCCACCCGTGGACCTGCCCAGGCCTG 7 0 0 
CCCAGGCCGGCGGAACAGGTCCCGCGAAGTCCAGACGGGTGGGGTGGACA 
TTGACGAAGGCTTGGAAGACTTTCCGCCGGAGGAGGATTTGCCGTTTTGA 8 0 0 
GCACGAA 



FIG. 76 



MARGLNRVFLIGALATRPDMRYTPAGLAILDLTLAGQDLLLSDNGGEPEV 
SWYHRVRLLGRQAEMWGDLLDQGQLVFVEGRLEYRQWEREGEKRSELQIR 
ADFLDPLDDRGKKRAEDSRGQPRLRAALNQVFLMGNLTRDPELRYTPQGT 
AVARLGLAVNERRQGAEERTHFVEVQAWRDLAEWAAELRKGDGLFVIGRL 
VNDSWTSSSGERRFQTRVEALRLERPTRGPAQACPGRRNRSREVQTGGVD 
IDEGLEDFPPEEDLPF 
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AATTCCGACATTTCAATTGAATCGTTTATTCCGCTTGAAAAAGAAGGCAA 
GTTGCTCGTTGATGTGAAAAGACCGGGGAGCATCGTACTGCAGGCGCGCT 10 0 
TTTTCTCTGAAATCGTGAAAAAACTGCCGCAACAAACGGTGGAAATCGAA 
ACGGAAGACAACTTTTTGACGATCATCCGCTCGGGGCACTCAGAATTCCG 2 0 0 
CCTCAATGGGCTAAACGCCGACGAATATCCGCGCCTGCCGCAAATTGAAG 
AAGAAAACGTGTTTCAAATCCCGGCTGATTTATTGAAAACCGTGATTCGG 3 0 0 
CAAACGGTGTTCGCCGTTTCTACATCGGAAACGCGCCCAATCTTGACAGG 
TGTCAACTGGAAAGTTGAACATGGCGAGCTTGTCTGCACAGCGACCGACA 400 
GTCATCGCTTAGCCATGCGCAAAGTGAAAATTGAGTCGGAAAATGAAGTA 
TCATACAACGTCGTCATCCCTGGAAAAAGTCTTAATGAGCTCAGCAAAAT 50 0 
TTTGGATGACGGCAACCACCCGGTGGACATCGTCATGACAGCCAATCAAG 
TGCTATTTAAGGCCGAGCACCTTCTCTTCTTTTCCCGGCTGCTTGACGGC 6 0 0 
AACTATCCGGAGACGGCCCGCTTGATTCCAACAGAAAGCAAAACGACCAT 
GATCGTCAATGCAAAAGAGTTTCTGCAGGCAATCGACCGAGCGTCCTTGC 70 0 
TTGCTCGAGAAGGAAGGAACAACGTTGTGAAACTGACGACGCTTCCTGGA 
GGAATGCTCGAAATTTCTTCGATTTCTCCGAGATCGGGAAAGTGACGGAG 8 0 0 
CAGCTGCAAACGGAGTCTCTTGAAGGGGAAGAGTTGAACATTTCGTTCAG 
CGCGAAATATATGATGGACGCGTTGCGGGCGCTTGATGGAACAGACATTT 9 0 0 
CAAATCAGCTTCACTGGGGCCATGCGGCCGTTCCTGTTGCGCCCGCTTCA 
ACCGATTCGATGCTTCAGCTCATTTTGCCGGTGAGAACATAT 9 92 
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NSD IS I IESFIPLEKEGKLLVDVKRPGS I VLQARFFSE IVKKLPQQTVE I 
ETEDNFLTI IRSGHSEFRLNGLNADEYPRLPQIEEENVFQ I PADLLKTVI 100 
RQTVFAVSTSETRPILTGVNWKVEHGELVCTATDSHRLAMRKVKIIESEN 
EVSYNWIPGKSLNELSKI ILDDGNHPVDIVMTANQVLFKAEHLLFFSRL 200 
LDGNYPETARLIPTESKTTMIVNAKEFLQAIDRASLLAREGRNNWKLTT 
LPGGMLEISSISPEIGKVTEQLQTESLEGEELNISFSAKYMMDALRALDG 300 
TDIQI S FTGAMRPFLLRPLHTDSMLQL ILPVRTY 
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ATGATTAACCGCGTCATTTTGGTCGGCAGGTTAACGAGAGATCCGGAGTT 
GCGTTACACTCCAAGCGGAGTGGCTGTTGCCACGTTTACGCTCGCGGTCA 10 0 
ACCGTCCGTTTACAAATCAGCAGGGCGAGCGGGAAACGGATTTTATTCAA 

GGGGAGCTTGGCTGGTGTCGATGGCCGACTGCAAACCCGCAGCTATGAAA 

ATCAAGAAGGTCGGCGTGTGTACGTGACGGAAGTGGTGGCTGATAGCGTC 3 0 0 
CAATTTCTTGAGCCGAAAGGAACGAGCGAGCAGCGAGGGGCGACAGCAGG 

CGGCTACTATGGGGAT C CATT C C CAT T CGGGCAAGAT CAGAAC C ACCAAT 4 0 0 
ATCCGAACGAAAAAGGGTTTGGCCGCATCGATGACGATCCTTTCGCCAAT 

GACGGCCAGCCGATCGATATTTCTGATGATGATTTGCCGTTT 4 9 2 



FIG. 80 



MINRVILVGRLTRDPELRYTPSGVAVATFTLAVNRPFTNQSYENQEGRRV 
YVTEWADSVQFLEPKGTSEQRGATAGGYYQGERETDFIQCWWRRQAEN 100 
VANFLKKGSLAGVDGRLQTRGDPFPFGQDQNHQYPNEKGFGRIDDDPFAN 
DGQPIDISDDDLPF 164 
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ATGCTGGAACGCGTATGGGGAAACATTGAAAAACGGCGTTTTTCTCCCCT 
TTATTTATTATACGGCAATGAGCCGTTTTTATTAACGGAAACGTATGAGC 10 0 
GATTGGTGAACGCAGCGCTTGGCCCCGAGGAGCGGGAGTGGAACTTGGCT 
GTGTACGACTGCGAGGAAACGCCGATCGAGGCGGCGCTTGAGGAGGCCGA 2 0 0 
GACGGTGCCGTTTTTCGGCGAGCGGCGTGTCATTCTCATCAAGCATCCAT 
ATTTTTTTACGTCTGAAAAAGAGAAGGAGATCGAACATGATTTGGCGAAG 3 0 0 
CTGGAGGCGTACTTGAAGGCGCCGTCGCCGTTTTCGATCGTCGTCTTTTT 
CGCGCCGTACGAGAAGCTTGATGAGCGAAAAAAAATTACGAAGCTCGCCA 4 0 0 
AAGAGCAAAGCGAAGTCGTCATCGCCGCCCCGCTCGCCGAAGCGGAGCTG 
CGTGCCTGGGTGCGGCGCCGCATCGAGAGCCAAGGGGCGCAAGCAAGCGA 500 
CGAGGCGATTGATGTCCTGTTGCGGCGGGCCGGGACGCAGCTTTCCGCCT 
TGGCGAATGAAATCGATAAATTGGCCCTGTTTGCCGGATCGGGCGGAACC 60 0 
ATCGAGGCGGCGGCGGTTGAGCGGCTTGTCGCCCGCACGCCGGAAGAAAA 
CGTATTTGTGCTTGTCGAGCAAGTGGCGAAGCGCGACATTCCAGCAGCGT 700 
TGCAGACGTTTTATGATCTGCTTGAAAACAATGAAGAGCCGATCAAAATT 
TTGGCGTTGCTCGCCGCCCATTTCCGCTTGCTTTCGCAAGTGAAATGGCT 800 
TGCCTCCTTAGGCTACGGACAGGCGCAAATTGCTGCGGCGCTCAAGGTGC 
ACCCGTTCCGCGTCAAGCTCGCTCTTGCTCAAGCGGCCCGCTTCGCTGAC 90 0 
GGAGAGCTTGCTGAGGCGATCAACGAGCTCGCTGACGCCGATTACGAAGT 
GAAAAGCGGGGCGGTCGATCGCCGGTTGGCCGTTGAGCTGCTTCTGATGC 1000 
GCTGGGGCGCCCGCCCGGCGCAAGCGGGGCGCCACGGCCGGCGG 
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MLERVWGNIEKRRFSPLYLLYGNEPFLLTETYERLVNAALGPEEREWNLA 

VYDCEETP I EAALEEAETVPFFGERRVI L I KHP YFFTSEKEKE IEHDLAK 10 0 
LEAYLKAPSPFSIWFFAPYEKLDERKKITKLAKEQSEWIAAPLAEAEL 

RAWVRRR I ESQGAQASDEAI DVLLRRAGTQLS ALANE I DKLALFAGSGGT 2 0 0 
I EAAAVERLVARTPEENVFVLVEQVAKRD I PAALQTF YDLLENNEEP I KI 

LALLAAHFRLLS QVKWLAS LGYGQAQ I AAALKVH P FRVKLALAQAARFAD 3 00 
GE LAE A I NE LADAD YE VKS GAVDRRLAVE L L LMRWGAR PAQAGRHGRR 
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ATGCGATGGGAACAGCTAGCGAAACGCCAGCCGGTGGTGGCGAAAATGCT 

GCAAAGCGGCTTGGAAAAAGGGCGGATTTCTCATGCGTACTTGTTTGAGG 10 0 
GGCAGCGGGGGACGGGCAAAAAAGCGGCCAGTTTGTTGTTGGCGAAACGT 

TTGTTTTGTCTGTCCCCAATCGGAGTTTCCCCGTGTCTAGAGTGCCGCAA 2 0 0 
CTGCCGGCGCATCGACTCCGGCAACCACCCTGACGTCCGGGTGATCGGCC 

CAGATGGAGGATCAATCAAAAAGGAACAAATCGAATGGCTGCAGCAAGAG 3 00 
TTCTCGAAAACAGCGGTCGAGTCGGATAAAAAAATGTACATCGTTGAGCA 

CGCCGATCAAATGACGACAAGCGCTGCCAACAGCCTTCTGAAATTTTTGG 4 00 
AAGAGCCGCATCCGGGGACGGTGGCGGTATTGCTGACTGAGCAATACCAC 

CGCCTGCTAGGGACGATCGTTTCCCGCTGTCAAGTGCTTTCGTTCCGGCC 5 0 0 
GTTGCCGCCGGCAGAGCTCGCCCAGGGACTTGTCGAGGAGCACGTGCCGT 

TGCCGTTGGCGCTGTTGGCTGCCCATTTGACAAACAGCTTCGAGGAAGCA 6 0 0 
CTGGCGCTTGCCAAAGATAGTTGGTTTGCCGAGGCGCGAACATTAGTGCT 

ACAATGGTATGAGATGCTGGGCAAGCCGGAGCTGCAGCTTTTGTTTTTCA 70 0 

GGACTTG 75 7 
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MRWEQLAKRQPWAKMLQSGLEKGRISHAYLFEGQRGTGKKAASLLLAKR 

LFCLSPIGVSPCLECRNCRRIDSGNHPDVRVIGPDGGSIKKEQIEWLQQE 10 0 

FSKTAVESDKKMYIVEHADQMTTSAANSLLKFLEEPHPGTVAVLLTEQYH 

RLLGTIVSRCQVLSFRPLPPAELAQGLVEEHVPLPLALLAAHLTNSFEEA 2 0 0 

LALAKDSWFAEARTLVLQWYEMLGKPELQLLFFIHDRLFPHFLESHQLDL 

GL 252 
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GTGGCATACCAAGCGTTATATCGCGTGTTTCGGCCGCAGCGCTTTGCGGA 

CATGGTCGGCCAAGAACACGTGACCAAGACGTTGCAAAGCGCCCTGCTTC 10 0 

AACATAAAATATCGCACGCTTACTTATTTTCCGGCCCGCGCGGTACAGGA 

AAAACGAGCGCAGCGAAAATTTTCGCCAAGGCGGTCAACTGTGAACAGGC 2 0 0 

GCCAGCGGCGGAGCCATGCAATGAGTGTCCAGCTTGCCTCGGCATTACGA 

ATGGAACGGTTCCCGATGTGCTGGAAATTGACGCTGCTTCCAACAACCGC 3 0 0 

GTCGATGAAATT CGTGATAT C CGTGAGAAGGTGAAATTTGCGCCAACGTC 

GGCCCGCTACAAAGTGTATATCATCGACGAGGTGCATATGCTGTCGATCG 4 0 0 

GTGCGTTTAACGCGCTGTTGAAAACGTTGGAGGAGCCGCCGAAACACGTC 

ATTTTCATTTTGGCCACGACCGAGCCGCACAAAATTCCGGCGACGATCAT 5 0 0 

TTCCCGCTGCCAACGGTTCGATTTTCGCCGCATCCCGCTTCAGGCGATCG 

TTTCACGGCTAAAGTACGTCGCAAGCGCCCAAGGTGTCGAGGCGTCAGAT 6 0 0 

GAGGCATTGTCCGCCATCGCCCGTGCTGCAGACGGGGGGATGCGCGATGC 

GCTCAGCTTGCTTGATCAAGCCATTTCGTTCAGCGACGGGAAACTTCGGC 7 0 0 

TCGACGACGTGCTGGCGATGACCGGGGCTGCATCATTTGCCGCCTTATCG 

AGCTTCATCGAAGCCATCCACCGCAAAGATACAGCGGCGGTTCTTCAGCA 8 0 0 

CTTGGAAACGATGATGGCGCAAGGGAAAGATCCGCATCGTTTGGTTGAAG 

ACTTGATTTTGTACTATCGCGATTTATTGCTGTACAAAACCGCTCCCTAT 9 0 0 

GTGGAGGGAGCGATTCAAATTGCTGTCGTTGACGAAGCGTTCACTTCACT 

GTCGGAAATGATTCCGGTTTCCAATTTATACGAGGCCATCGAGTTGCTGA 10 0 0 

ACAAAAGCCAGCAAGAGATGAAGTGGACAAACCACCCGCGCCTTCTGTTG 

GAAGTGGCGCTTGTGAAACTTTGCCATCCATCAGCCGCCGCCCCGTCGCT 110 0 

GTCGGCTTCCGAGTTGGAACCGTTGATAAAGCGGATTGAAACGCTGGAGG 

CGGAATTGCGGCGCCTGAAGGAACAACCGCCTGCCCCTCCGTCGACCGCC 12 0 0 

GCGCCGGTGAAAAAACTGTCCAAACCGATGAAAACGGGGGGATATAAAGC 

CCCGGTTGGCCGCATTTACGAGCTGTTGAAACAGGCGACGCATGAAGATT 13 0 0 

TAGCTTTGGTGAAAGGATGCTGGGCGGATGTGCTCGACACGTTGAAACGG 

CAGCATAAAGTGTCGCACGCTGCCTTGCTGCAAGAGAGCGAGCCGGTTGC 14 0 0 

AGCGAGCGCCTCAGCGTTTGTATTAAAATTCAAATACGAAATCCACTGCA 

AAATGGCGACCGATCCCACAAGTTCGGTCAAAGAAAACGTCGAAGCGATT 15 0 0 

TTGTTTGAGCTGACAAACCGCCGCTTTGAAATGGTAGCCATTCCGGAGGG 

AGAATGGGGAAAAATAAGAGAAGAGTTCATCCGCAATAAGGACGCCATGG 16 0 0 

TGGAAAAAAGCGAAGAAGATCCGTTAATCGCCGAAGCGAAGCGGCTGTTT 

GG CGAAGAG CTGAT CGAAATTAAAGAA 16 77 
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VAYQALYRVFRPQRFADMVGQEHVTKTLQSALLQHKISHAYLFSGPRGTG 

KT SAAKI F AKAVNCE CAP AAE P CNE C P AC LG I TNGTVPDVLE I DAASNNR 10 0 

VDE I RD I REKVKFAPTS ARYKVY I IDEVHMLS I GAFNALLKTLEE PPKHV 

IFILATTEPHKIPATI ISRCQRFDFRRIPLQAIVSRLKYVASAQGVEASD 2 00 

EALSAIARAADGGMRDALSLLDQAISFSDGKLRLDDVLAMTGAASFAALS 

S F I EAI HRKDTAAVLQHLE TMMAQGKD PHRLVEDL I LYYRDLLLYKTAPY 3 0 0 

VEGAIQIAWDEAFTSLSEMIPVSNLYEAIELLNKSQQEMKWTNHPRLLL 

EVALVKL CHP S AAAPS LS AS ELEPLIKRI E TLEAELRRLKEQ P PAP P S TA 4 0 0 

APVKKLSKPMKTGGYKAPVGRIYELLKQATHEDLALVKGCWADVLDTLKR 

QHKVSHAALLQESEPVAASASAFVLKFKYE IHCKMATDPTSSVKENVEAI 5 0 0 

LFELTNRRFEMVAI PEGEWGKI REEF I RNKDAMVEKSEEDPL I AEAKRLF 

GEELIEIKE 559 
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ATGGTGACAAAAGAGCAAAAAGAGCGGTTTCTCATCCTGCTTGAGCAGCT 
GAAGATGACGTCGGACGAATGGATGCCGCATTTTCGTGAGGCAGCCATTC 10 0 
GCAAAGTCGTGATCGATAAAGAGGAGAAAAGCTGGCATTTTTATTTTCAG 
TTCGACAACGTGCTGCCGGTTCATGTATACAAAACGTTTGCCGATCGGCT 2 0 0 
GCAGACGGCGTTCCGCCATATCGCCGCCGTCCGCCATACGATGGAGGTCG 
AAGCGCCGCGCGTAACTGAGGCGGATGTGCAGGCGTATTGGCCGCTTTGC 3 0 0 
CTTGCCGAGCTGCAAGAAGGCATGTCGCCGCTTGTCGATTGGCTCAGCCG 
GCAGACGCCTGAGCTGAAAGGAAACAAGCTGCTTGTCGTTGCCCGCCATG 4 0 0 
AAGCGGAAGCGCTGGCGATCAAACGGCGGTTCGCCAAAAAAATCGCTGAT 
GTGTACGCTTCGTTTGGGTTTCCCCCCCTTCAGCTTGACGTCAGCGTCGA 50 0 
GCCGTCCAAGCAAGAAATGGAACAGTTTTTGGCGCAAAAACAGCAAGAGG 
ACGAAGAGCGAGCGCTTGCTGTACTGACCGATTTAGCGAGGGAAGAAGAA 60 0 
AAGGCCGCGTCTGCGCCGCCGTCCGGTCCGCTTGTCATCGGCTATCCGAT 
CCGCGACGAGGAGCCGGTGCGGCGGCTTGAAACGATCGTCGAAGAAGAGC 7 0 0 
GGCGCGTCGTTGTGCAAGGCTATGTATTTGACGCCGAAGTGAGCGAATTA 
AAAAGCGGCCGCACGCTGTTGACCATGAAAATCACAGATTACACGAACTC 8 0 0 
GATTTTAGTCAAAATGTTCTCGCGCGACAAAGAGGACGCCGAGCTTATGA 
GCGGCGTCAAAAAAGGCATGTGGGTGAAAGTGCGCGGCAGCGTGCAAAAC 90 0 
GATACGTTCGTCCGTGATTTGGTCATCATCGCCAACGATTTGAACGAAAT 
CGCCGCAAACGAACGGCAAGATACGGCGCCGGAAGGGGAAAAGAGGGTCG 10 0 0 
AGCTCCATTTGCATACCCCGATGAGCCAAATGGACGCGGTCACCTCGGTG 
ACAAAACTCATTGAGCAAGCGAAAAAATGGGGGCATCCGGCGATCGCCGT 110 0 
CACCGACCATGCCGTTGTTCAGTCGTTTCCGGAGGCCTACAGCGCGGCGA 
AAAAACACGGCATGAAGGTCATTTACGGCCTTGAGGCGAACATCGTCGAC 12 00 
GATGGCGTGCCGATCGCCTACAATGAGACGCACCGCCGTCTTTCGGAGGA 
AACGTACGTCGTCTTTGACGTCGAGACGACGGGCCTGTCGGCTGTGTACA 1300 
ATACGATCATTGAGCTGGCGGCGGTGAAAGTGAAAGACGGCGAGATCATC 
GACCGATTCATGTCGTTTGCCAACCCTGGACATCCGTTGTCGGTGACAAC 14 00 
GATGGAGCTGACTGGGATCACCGATGAGATGGTGAAAGACGCCCCGAAGC 
CGGACGAGGTGCTAGCCCGTTTTGTTGACTGGGCCGGCGATGCGACGCTT 1500 
GTTGCCCACAACGCCAGCTTTGACATCGGTTTTTTAAACGCGGGCCTCGC 
TCGCATGGGGCGCGGCAAAATCGCGAATCCAGTCATCGATACGCTCGAGC 1600 
TGGCCCGTTTTTTATACCCGGATTTGAAAAACCATCGGCTCAATACATTG 
TGCAAAAAATTTGACATTGAATTGACGCAGCATCACCGCGCCATCTACGA 17 00 
CGCGGAGGCGACCGGGCATTTGCTTATGCGGCTGTTGAAGGAAGCGGAAG 
AGCGCGGCATACTGTTTCATGACGAATTAAACAGCCGCACGCACAGCGAA 18 00 
GCGTCCTATCGGCTTGCGCGCCCGTTCCATGTGACGCTGTTGGCGCAAAA 
CGAGACTGGATTGAAAAATTTGTTCAAGCTTGTGTCATTGTCGCACATTC 1900 
AATATTTTCACCGTGTGCCGCGCATCCCGCGCTCCGTGCTCGTCAAGCAC 
CGCGACGGCCTGCTTGTCGGCTCGGGCTGCGACAAAGGAGAGCTGTTTGA 2 000 
CAACTTGATCCAAAAGGCGCCGGAAGAAGTCGAAGACATCGCCCGTTTTT 
ACGATTTTCTTGAAGTGCATCCGCCGGACGTGTACAAGCCGCTCATCGAG 210 0 
ATGGATTATGT G AAA G AC G AAG AGAT GAT C AAAAAC AT CATCCGCAGCAT 
CGTCGCCCTTGGTGAGAAGCTTGACATCCCGGTTGTCGCCACTGGCAACG 22 00 
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T C CAT T ACT T GAAC C C AGAAGAT AAAAT T T AC C G GAAAAT C T T AAT C C AT 
TCGCAAGGCGGGGCGAATCCGCTCAACCGCCATGAACTGCCGGATGTATA 2300 
TTTCCGTACGACGAATGAAATGCTTGACTGCTTCTCGTTTTTAGGGCCGG 
AAAAAGCGAAGGAAATCGTCGTTGACAACACGCAAAAAATCGCTTCGTTA 24 0 0 
ATCGGCGATGTCAAGCCGATCAAAGATGAGCTGTATACGCCGCGCATTGA 
AGGGGCGGACGAGGAAATCAGGGAAATGAGCTACCGGCGGGCGAAGGAAA 2500 
TTTACGGCGACCCGTTGCCGAAACTTGTTGAAGAGCGGCTTGAGAAGGAG 
CTAAAAAGCATCATCGGCCATGGCTTTGCCGTCATTTATTTGATCTCGCA 2 600 
CAAGCTTGTGAAAAAATCGCTCGATGACGGCTACCTTGTCGGGTCGCGCG 
GATCGGTCGGCTCGTCGTTTGTCGCGACGATGACGGAAATCACCGAGGTC 2 7 00 
AATCCGCTGCCGCCGCATTACGTTTGCCCGAACTGCAAGCATTCGGAGTT 
CTTTAACGACGGTTCAGTCGGCTCAGGGTTTGATTTGCCGGATAAAAACT 28 00 
GCCCGCGATGTGGGACGAAATACAAGAAAGACGGGCACGACATCCCGTTT 
GAGACGTTTCTCGGCTTTAAAGGCGACAAAGTGCCGGATATCGACTTGAA 2 900 
CTTTTCCGGCGAATACCAGCCGCGCGCCCACAACTATACGAAAGTGCTGT 
TTGGCGAAGACAACGTCTACCGCGCCGGGACGATTGGCACGGTCGCTGAC 30 0 0 
AAAACGGCGTACGGATTTGTCAAAGCGTATGCGAGCGACCATAACTTAGA 
GCTGCGCGGCGCGGAAATCGACGGCTCGCGGCTGGCTGCACCGGGGTGAA 310 0 
GCGGACGACCGGGCAGCATCCGGGCGGCATCATCGTCGTCCCGGATTATA 
TGGAAATTTACGATTTTACGCCGATTCAATATCCGGCCGATGACACGTCC 32 0 0 
TCTGAATGGCGGACGACCCATTTCGACTTCCATTCGATCCACGACAATTT 
GTTGAAGCTCGATATTCTCGGGCACGACGATCCGACGGTCATTCGCATGC 3300 
TGCAAGATTTAAGCGGCATCGATCCGAAAACGATCCCGACCGACGACCCG 
GATGTGATGGGCATTTTCAGCAGCACCGAGCCGCTTGGCGTTACGCCGGA 34 00 
GCAAATCATGTGCAATGTCGGCACGATCGGCATTCCGGAGTTTGGCACGC 
GCTTCGTTCGGCAAATGTTGGAAGAGACAAGGCCAAAAACGTTTTCCGAA 35 0 0 
CTCGTGCAAATTTCCGGCTTGTCGCACGGCACCGATGTGTGGCTCGGCAA 
CGCGCAAGAGCTCATTCAAAACGGCACGTGTACGTTATCGGAAGTCATCG 3 60 0 
GCTGCCGCGACGACATTATGGTCTATTTGATTTACCGCGGGCTCGAGCCG 
TCGCTCGCTTTTAAAATCATGGAATCCGTGCGCAAAGGAAAAGGCTTAAC 3 7 0 0 
GCCGGAGTTTGAAGCAGAAATGCGCAAACATGACGTGCCGGAGTGGTACA 
TCGATTCATGCAAAAAAATCAAGTACATGTTCCCGAAAGCGCACGCCGCC 38 0 0 
GCCTACGTGTTAATGGCGGTGCGCATCGCCTACTTTAAGGTGCACCATCC 
GCTTTTGTATTACGCGTCGTACTTTACGGTGCGGGCGGAGGACTTTGACC 3 90 0 
TTGACGCCATGATCAAAGGATCACCCGCCATTCGCAAGCGGATTGAGGAA 
ATCAACGCCAAAGGCATTCAGGCGACGGCGAAAGAAAAAAGCTTGCTCAC 4 0 0 0 
GGTTCTTGAGGTGGCCTTAGAGATGTGCGAGCGCGGCTTTTCCTTTAAAA 
ATATCGATTTGTACCGCTCGCAGGCGACGGAATTCGTCATTGACGGCAAT 4100 
TCTCTCATTCCGCCGTTCAACGCCATTCCGGGGCTTGGGACGAACGTGGC 
GCAGGCGATCGTGCGCGCCCGCGAGGAAGGCGAGTTTTTGTCGAAGGAGG 4 2 0 0 
ATTTGCAACAGCGCGGCAAATTGTCGAAAACGCTGCTCGAGTATCTAGAA 
AGCCGCGGCTGCCTTGACTCGCTTCCAGACCATAACCAGCTGTCGCTGTT 4 300 
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MVTKEQKERFLILLEQLKMTSDEWMPHFREAAIRKVVIDKEEKSWHFYFQ 

FDNVLPVHVYKTFADRLQTAFRHIAAVRHTMEVEAPRVTEADVQAYWPLC 10 0 

LAELQEGMSPLVDWLSRQTPELKGNKLLVVARHEAEALAIKRRFAKKIAD 

VYASFGFPPLQLDVSVEPSKQEMEQFLAQKQQEDEERALAVLTDLAREEE 2 00 

KAASAPPSGPLVIGYPIRDEEPVRRLETIVEEERRVVVQGYVFDAEVSEL 

KSGRTLLTMKITDYTNS ILVKMFSRDKEDAELMSGVKKGMWVKVRGSVQN 30 0 

DTFVRDLVI IANDLNEIAANERQDTAPEGEKRVELHLHTPMSQMDAVTSV 

T KL I EQAKKWGH PAI AVT DHAVVQS FPEAYSAAKKHGMKVI YGLEAN I VD 4 0 0 

DGVPIAYNETHRRLSEETYWFDVETTGLSAVYNTIIELAAVKVKDGEII 

DRFMSFANPGHPLSVTTMELTGITDEMVKDAPKPDEVLARFVDWAGDATL 500 

VAHNASFDIGFLNAGLARMGRGKIANPVIDTLELARFLYPDLKNHRLNTL 

CKKFDIELTQHHRAIYDAEATGHLLMRLLKEAEERGILFHDELNSRTHSE 600 

ASYRLARPFHVTLLAQNETGLKNLFKLVSLSHIQYFHRVPRIPRSVLVKH 

RDGLLVGSGCDKGELFDNLIQKAPEEVEDIARFYDFLEVHPPDVYKPLIE 7 00 

MDYVKDEEMIKNI IRSIVALGEKLDIPVVATGNVHYLNPEDKI YRKILIH 

SQGGANPLNRHELPDVYFRTTNEMLDCFSFLGPEKAKEI VVDNTQKIASL 8 00 

IGDVKPIKDELYTPRIEGADEEIREMSYRRAKEI YGDPLPKLVEERLEKE 

LKSI IGHGFAVIYLISHKLVKKSLDDGYLVGSRGSVGSSFVATMTEITEV 90 0 

NPLPPHYVCPNCKHSEFFNDGSVGSGFDLPDKNCPRCGTKYKKDGHDIPF 

ETFLGFKGDKVPDIDLNFSGEYQPRAHNYTKVLFGEDNVYRAGTIGTVAD 1000 

KTAYGFVKAYASDHNLELRGAEIDLAAGCTGVKRTTGQHPGGIIVVPDYM 

EIYDFTPIQYPADDTSSEWRTTHFDFHS IHDNLLKLDILGHDDPTVIRML 110 0 

QDLSGIDPKTIPTDDPDVMGIFSSTEPLGVTPEQIMCNVGTIGIPEFGTR 

FVRQMLEETRPKTFSELVQISGLSHGTDVWLGNAQELIQNGTCTLSEVIG 1200 

CRDDIMVYLIYRGLEPSLAFKIMESVRKGKGLTPEFEAEMRKHDVPEWYI 

DSCKKIKYMFPKAHAAAYVLMAVRIAYFKVHHPLLYYAS YFTVRAEDFDL 1300 

DAMIKGSPAIRKRIEEINAKGIQATAKEKSLLTVLEVALEMCERGFSFKN 

IDLYRSQATEFVIDGNSLIPPFNAI PGLGTNVAQAIVRAREEGEFLSKED 14 00 

LQQRGKLSKTLLEYLESRGCLDSLPDHNQLSLF 
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SEQUENCE LISTING 



<110> O'Donnell, Michael E. 
Yuzhakov, Alexander 
Yurieva, Olga 
Jeruzalmi, David 
Bruck, Irina 
Kuriyan, John 

<12 0> ENZYMES DERIVED FROM THERMOPHILIC ORGANISMS THAT 

FUNCTION AS A CHROMOSOMAL REPLICASE, PREPARATION AND 
USE THEREOF 

<130> 22221/1030 

<140> 
<141> 

<150> 60/143,202 
<151> 1997-04-08 

<150> 08/823,407 
<151> 1997-04-08 

<150> 09/057,416 
<151> 1998-04-08 

<160> 212 

<170> Patentln Ver. 2.1 

<210> 1 
<211> 2007 
<212> DNA 

<213> Thermus thermophilus 
<400> 1 

tccgggggtg gggttcccag gtagaccccg gcccctcccg tgagcccctt tacccaggcc 60 
gccacctcct ccaggggggc caaggcgtgc aaggagagga acgtccgcac cacgccctat 120 
actagccttg tgagcgccct ctaccgccgc ttccgccccc tcaccttcca ggaggtggtg 180 
gggcaggagc acgtgaagga gcccctcctc aaggccatcc gggaggggag gctcgcccag 240 
gcctacctct tctccgggcc caggggcgtg ggcaagacca ccacggcgag gctcctcgcc 3 00 
atggcggtgg ggtgccaggg ggaagacccc ccttgcgggg tctgccccca ctgccaggcg 3 60 
gtgcagaggg gcgcccaccc ggacgtggtg gacattgacg ccgccagcaa caactccgtg 42 0 
gaggacgtgc gggagctgag ggaaaggatc cacctcgccc ccctctctgc ccccaggaag 4 80 
gtcttcatcc tggacgaggc ccacatgctc tccaaaagcg ccttcaacgc cctcctcaag 54 0 
accctggagg agcccccgcc ccacgtcctc ttcgtcttcg ccaccaccga gcccgagagg 600 
atgcccccca ccatcctctc ccgcacccag cacttccgct tccgccgcct cacggaggag 66 0 



1 



gagatcgcct ttaagctccg gcgcatcctg 
gccctcctcc tcctcgcccg cctggcggac 
gagcgcttcc tcctcctgga aggccccctc 
tcccccccag ggaccggggt ggccgagatc 
gaggccctgg gcctcgcccg gcgcctctac 
tcgggccttt tggaggtgtt ccgggaaggc 
ccccttcccg ccccgcccca ggccctgatc 
gagcgcctcg cccgccgctc cgacgcctta 
agggccctgg ccgccgaggc cctaccccag 
cccaagccgg aaagcccccc gaccccggaa 
cgggagcggt ggcgggcctt cctcgaggcc 
gaggcccgcc cggaggtccg ggaaggccag 
ttccactacc gcaaggcctc ggaacagaag 
ttcggggtgg aggaggtcgt cctcgtcctg 
ccccgcccgg ccccacctcc tgaagcgccc 
gaggcggagg aagcggcgga ggaggccccg 
ctgggggggc gggtgctctg ggtgcggcgg 
cccctgagcc aagacgagat agggggtact 
cgacctcgga caagagaccg tggacaacat 
ggtgcggggg ctccagaaga tggtggccga 
gatgaccgcc accaagaagg ccatggaggc 
gaacgtctgc gccgccgagg tctccgaggg 
cgccaccatg ctgaagaact tcatcta 



gaggccgtgg ggcgggaggc ggaggaggag 72 0 
ggggccctta gggacgcgga aagcctcctg 780 
acccggaagg aggtggagcg cgccctaggc 84 0 
gccgcctccc tcgcgagggg gaaaacggcg 90 0 
ggggaagggt acgccccgag gagcctggtc 96 0 
ctctacgccg ccttcggcct cgcgggaacc 102 0 
gccgccatga ccgccctgga cgaggccatg 10 80 
agcctggagg tggccctcct ggaggcggga 114 0 
cccacgggcg ctccttcccc agaggtcggc 12 0 0 
cccccaaggc ccgaggaggc gcccgacctg 12 60 
ctcaggccca ccctacgggc cttcgtgcgg 132 0 
ctctgcctcg ctttccccga ggacaaggcc 1380 
gtgaggctcc tccccctggc ccaggcccat 144 0 
gagggagaaa aaaaaagcct gagcccaagg 150 0 
gcacccccgg gccctcccga ggaggaggta 1560 
gaggaggcct tgaggcgggt ggtccgcctc 162 0 
cccaggaccc gggaggcgcc ggaggaggaa 1680 
ggtatataat gggggcatga cgcggaccac 1740 
cctcaagcgc ctccgccgta ttgagggcca 1800 
gggccgcccc tgcgacgagg tcctcaccca 1860 
ggcggccacc ctgatcctcc acgagttcct 1920 
caaggtgaac cccaagaagc ccgaggagat 1980 
2007 



<210> 2 
<211> 529 
<212> PRT 

<213> Thermus thermophi lus 
<400> 2 

Met Ser Ala Leu Tyr Arg Arg Phe Arg Pro Leu Thr Phe Gin Glu Val 
15 10 15 

Val Gly Gin Glu His Val Lys Glu Pro Leu Leu Lys Ala lie Arg Glu 
20 25 30 

Gly Arg Leu Ala Gin Ala Tyr Leu Phe Ser Gly Pro Arg Gly Val Gly 
35 40 45 

Lys Thr Thr Thr Ala Arg Leu Leu Ala Met Ala Val Gly Cys Gin Gly 
50 55 60 

Glu Asp Pro Pro Cys Gly Val Cys Pro His Cys Gin Ala Val Gin Arg 
65 70 75 80 

Gly Ala His Pro Asp Val Val Asp lie Asp Ala Ala Ser Asn Asn Ser 
85 90 95 



2 



Val Glu Asp Val Arg Glu Leu Arg 
100 

Ser Ala Pro Arg Lys Val Phe lie 
115 120 

Lys Ser Ala Phe Asn Ala Leu Leu 
130 135 

His Val Leu Phe Val Phe Ala Thr 
145 150 

Thr lie Leu Ser Arg Thr Gin His 
165 



Glu Arg lie His Leu Ala Pro Leu 
105 110 

Leu Asp Glu Ala His Met Leu Ser 
125 

Lys Thr Leu Glu Glu Pro Pro Pro 
140 

Thr Glu Pro Glu Arg Met Pro Pro 
155 160 

Phe Arg Phe Arg Arg Leu Thr Glu 
170 175 



Glu Glu He Ala 
180 

Glu Ala Glu Glu 
195 

Ala Leu Arg Asp 
210 

Gly Pro Leu Thr 
225 

Gly Thr Gly Val 



Ala Glu Ala Leu 
260 

Pro Arg Ser Leu 
275 

Tyr Ala Ala Phe 
290 

Ala Leu He Ala 
305 

Ala Arg Arg Ser 



Phe Lys Leu Arg 



Glu Ala Leu Leu 
200 

Ala Glu Ser Leu 
215 

Arg Lys Glu Val 
230 

Ala Glu He Ala 
245 

Gly Leu Ala Arg 



Val Ser Gly Leu 
280 

Gly Leu Ala Gly 
295 

Ala Met Thr Ala 
310 

Asp Ala Leu Ser 
325 



Arg He Leu Glu 
185 

Leu Leu Ala Arg 



Leu Glu Arg Phe 
220 

Glu Arg Ala Leu 
235 

Ala Ser Leu Ala 
250 

Arg Leu Tyr Gly 
265 

Leu Glu Val Phe 



Thr Pro Leu Pro 
300 

Leu Asp Glu Ala 

315 

Leu Glu Val Ala 
330 



Ala Val Gly Arg 
190 

Leu Ala Asp Gly 
205 

Leu Leu Leu Glu 



Gly Ser Pro Pro 
240 

Arg Gly Lys Thr 
255 

Glu Gly Tyr Ala 
270 

Arg Glu Gly Leu 
285 

Ala Pro Pro Gin 



Met Glu Arg Leu 
320 

Leu Leu Glu Ala 
335 



Gly Arg Ala Leu Ala Ala Glu Ala Leu Pro Gin Pro Thr Gly Ala Pro 
340 345 350 



3 



Ser Pro Glu Val Gly Pro Lys Pro Glu Ser Pro Pro Thr Pro Glu Pro 
355 360 365 



Pro Arg Pro Glu 

370 

Leu Glu Ala Leu 
385 

Pro Glu Val Arg 



Ala Phe His Tyr 
420 

Leu Ala Gin Ala 
435 

Gly Glu Lys Lys 
450 

Glu Ala Pro Ala 
465 

Glu Ala Ala Glu 



Leu Leu Gly Gly 
500 

Ala Pro Glu Glu 
515 



Glu Ala Pro Asp 
375 

Arg Pro Thr Leu 
390 

Glu Gly Gin Leu 
405 

Arg Lys Ala Ser 



His Phe Gly Val 
440 

Ser Leu Ser Pro 
455 

Pro Pro Gly Pro 
470 

Glu Ala Pro Glu 
485 

Arg Val Leu Trp 



Glu Pro Leu Ser 
520 



Leu Arg Glu Arg 
380 

Arg Ala Phe Val 
395 

Cys Leu Ala Phe 
410 

Glu Gin Lys Val 
425 

Glu Glu Val Val 



Arg Pro Arg Pro 
460 

Pro Glu Glu Glu 
475 

Glu Ala Leu Arg 
490 

Val Arg Arg Pro 
505 

Gin Asp Glu lie 



Trp Arg Ala Phe 



Arg Glu Ala Arg 
400 

Pro Glu Asp Lys 
415 

Arg Leu Leu Pro 
430 

Leu Val Leu Glu 
445 

Ala Pro Pro Pro 



Val Glu Ala Glu 
480 

Arg Val Val Arg 
495 

Arg Thr Arg Glu 
510 

Gly Gly Thr Gly 
525 



He 



<210> 3 
<211> 1590 
<212> DNA 

<213> Thermus thermophilus 
<400> 3 

gtgagcgccc tctaccgccg cttccgcccc 
cacgtgaagg agcccctcct caaggccatc 
ttctccgggc ccaggggcgt gggcaagacc 
gggtgccagg gggaagaccc cccttgcggg 



ctcaccttcc aggaggtggt ggggcaggag 6 0 
cgggagggga ggctcgccca ggcctacctc 12 0 
accacggcga ggctcctcgc catggcggtg 180 
gtctgccccc actgccaggc ggtgcagagg 240 



4 



ggcgcccacc cggacgtggt ggacattgac 
cgggagctga gggaaaggat ccacctcgcc 
ctggacgagg cccacatgct ctccaaaagc 
gagcccccgc cccacgtcct cttcgtcttc 
accatcctct cccgcaccca gcacttccgc 
tttaagctcc ggcgcatcct ggaggccgtg 
ctcctcgccc gcctggcgga cggggccctt 
ctcctcctgg aaggccccct cacccggaag 
gggaccgggg tggccgagat cgccgcctcc 
ggcctcgccc ggcgcctcta cggggaaggg 
ttggaggtgt tccgggaagg cctctacgcc 
gccccgcccc aggccctgat cgccgccatg 
gcccgccgct ccgacgcctt aagcctggag 
gccgccgagg ccctacccca gcccacgggc 
gaaagccccc cgaccccgga acccccaagg 
tggcgggcct tcctcgaggc cctcaggccc 
ccggaggtcc gggaaggcca gctctgcctc 
cgcaaggcct cggaacagaa ggtgaggctc 
gaggaggtcg tcctcgtcct ggagggagaa 
gccccacctc ctgaagcgcc cgcacccccg 
gaagcggcgg aggaggcccc ggaggaggcc 
cgggtgctct gggtgcggcg gcccaggacc 
caagacgaga tagggggtac tggtatataa 



gccgccagca acaactccgt ggaggacgtg 3 00 
cccctctctg cccccaggaa ggtcttcatc 360 
gccttcaacg ccctcctcaa gaccctggag 42 0 
gccaccaccg agcccgagag gatgcccccc 480 
ttccgccgcc tcacggagga ggagatcgcc 54 0 
gggcgggagg cggaggagga ggccctcctc 600 
agggacgcgg aaagcctcct ggagcgcttc 66 0 
gaggtggagc gcgccctagg ctccccccca 72 0 
ctcgcgaggg ggaaaacggc ggaggccctg 780 
tacgccccga ggagcctggt ctcgggcctt 840 
gccttcggcc tcgcgggaac cccccttccc 900 
accgccctgg acgaggccat ggagcgcctc 960 
gtggccctcc tggaggcggg aagggccctg 102 0 
gctccttccc cagaggtcgg ccccaagccg 1080 
cccgaggagg cgcccgacct gcgggagcgg 1140 
accctacggg ccttcgtgcg ggaggcccgc 12 0 0 
gctttccccg aggacaaggc cttccactac 1260 
ctccccctgg cccaggccca tttcggggtg 132 0 
aaaaaaagcc tgagcccaag gccccgcccg 13 80 
ggccctcccg aggaggaggt agaggcggag 1440 
ttgaggcggg tggtccgcct cctggggggg 15 00 
cgggaggcgc cggaggagga acccctgagc 15 6 0 
1590 



<210> 4 
<211> 464 
<212> PRT 

<213> Thermus thermophilic 
<400> 4 

Met Ser Ala Leu Tyr Arg Arg Phe Arg Pro Leu Thr Phe Gin Glu Val 
15 10 15 

Val Gly Gin Glu His Val Lys Glu Pro Leu Leu Lys Ala lie Arg Glu 
20 25 30 

Gly Arg Leu Ala Gin Ala Tyr Leu Phe Ser Gly Pro Arg Gly Val Gly 
35 40 45 

Lys Thr Thr Thr Ala Arg Leu Leu Ala Met Ala Val Gly Cys Gin Gly 
50 55 60 

Glu Asp Pro Pro Cys Gly Val Cys Pro His Cys Gin Ala Val Gin Arg 
65 70 75 80 

Gly Ala His Pro Asp Val Val Asp lie Asp Ala Ala Ser Asn Asn Ser 
85 90 95 



5 



Val Glu Asp Val Arg Glu Leu Arg Glu Arg lie His Leu Ala Pro Leu 
100 105 110 



Ser Ala Pro Arg Lys 
115 

Lys Ser Ala Phe Asn 
130 

His Val Leu Phe Val 
145 

Thr He Leu Ser Arg 
165 

Glu Glu He Ala Phe 
180 

Glu Ala Glu Glu Glu 
195 

Ala Leu Arg Asp Ala 
210 

Gly Pro Leu Thr Arg 
225 

Gly Thr Gly Val Ala 
245 

Ala Glu Ala Leu Gly 
260 

Pro Arg Ser Leu Val 
275 

Tyr Ala Ala Phe Gly 
290 

Ala Leu He Ala Ala 
305 

Ala Arg Arg Ser Asp 
325 

Gly Arg Ala Leu Ala 
340 



Val Phe He Leu Asp Glu 
120 

Ala Leu Leu Lys Thr Leu 
135 

Phe Ala Thr Thr Glu Pro 

150 155 

Thr Gin His Phe Arg Phe 
17 0 

Lys Leu Arg Arg He Leu 
185 

Ala Leu Leu Leu Leu Ala 
200 

Glu Ser Leu Leu Glu Arg 
215 

Lys Glu Val Glu Arg Ala 
230 235 

Glu He Ala Ala Ser Leu 
250 

Leu Ala Arg Arg Leu Tyr 
265 

Ser Gly Leu Leu Glu Val 
280 

Leu Ala Gly Thr Pro Leu 
295 

Met Thr Ala Leu Asp Glu 
310 315 

Ala Leu Ser Leu Glu Val 
330 

Ala Glu Ala Leu Pro Gin 
345 



Ala His Met Leu Ser 
125 

Glu Glu Pro Pro Pro 
140 

Glu Arg Met Pro Pro 
160 

Arg Arg Leu Thr Glu 
175 

Glu Ala Val Gly Arg 
190 

Arg Leu Ala Asp Gly 
205 

Phe Leu Leu Leu Glu 
220 

Leu Gly Ser Pro Pro 
240 

Ala Arg Gly Lys Thr 
255 

Gly Glu Gly Tyr Ala 
270 

Phe Arg Glu Gly Leu 
285 

Pro Ala Pro Pro Gin 
300 

Ala Met Glu Arg Leu 
320 

Ala Leu Leu Glu Ala 
335 

Pro Thr Gly Ala Pro 
350 



6 



Ser Pro Glu Val Gly Pro Lys Pro 
355 360 

Pro Arg Pro Glu Glu Ala Pro Asp 
370 375 

Leu Glu Ala Leu Arg Pro Thr Leu 
385 390 

Pro Glu Val Arg Glu Gly Gin Leu 
405 

Ala Phe His Tyr Arg Lys Ala Ser 
420 

Leu Ala Gin Ala His Phe Gly Val 
435 440 

Gly Glu Lys Lys Lys Pro Glu Pro 
450 455 



Glu Ser Pro Pro Thr Pro Glu Pro 
365 

Leu Arg Glu Arg Trp Arg Ala Phe 
380 

Arg Ala Phe Val Arg Glu Ala Arg 
395 400 

Cys Leu Ala Phe Pro Glu Asp Lys 
410 415 

Glu Gin Lys Val Arg Leu Leu Pro 
425 430 

Glu Glu Val Val Leu Val Leu Glu 
445 

Lys Ala Pro Pro Gly Pro Thr Ser 
460 



<210> 5 
<211> 454 
<212> PRT 

<213> Thermus thermophilus 
<400> 5 

Met Ser Ala Leu Tyr Arg Arg Phe Arg Pro Leu Thr Phe Gin Glu Val 
15 10 15 

Val Gly Gin Glu His Val Lys Glu Pro Leu Leu Lys Ala lie Arg Glu 
20 25 30 

Gly Arg Leu Ala Gin Ala Tyr Leu Phe Ser Gly Pro Arg Gly Val Gly 
35 40 45 

Lys Thr Thr Thr Ala Arg Leu Leu Ala Met Ala Val Gly Cys Gin Gly 
50 55 60 

Glu Asp Pro Pro Cys Gly Val Cys Pro His Cys Gin Ala Val Gin Arg 
65 70 75 80 

Gly Ala His Pro Asp Val Val Asp lie Asp Ala Ala Ser Asn Asn Ser 



7 



Val Glu Asp Val Arg Glu Leu Arg Glu Arg lie His Leu Ala Pro Leu 
100 105 110 

Ser Ala Pro Arg Lys Val Phe lie Leu Asp Glu Ala His Met Leu Ser 
115 120 125 

Lys Ser Ala Phe Asn Ala Leu Leu Lys Thr Leu Glu Glu Pro Pro Pro 
130 135 140 

His Val Leu Phe Val Phe Ala Thr Thr Glu Pro Glu Arg Met Pro Pro 
145 150 155 160 

Thr lie Leu Ser Arg Thr Gin His Phe Arg Phe Arg Arg Leu Thr Glu 
165 170 175 

Glu Glu He Ala Phe Lys Leu Arg Arg He Leu Glu Ala Val Gly Arg 
180 185 190 

Glu Ala Glu Glu Glu Ala Leu Leu Leu Leu Ala Arg Leu Ala Asp Gly 
195 200 205 

Ala Leu Arg Asp Ala Glu Ser Leu Leu Glu Arg Phe Leu Leu Leu Glu 
210 215 220 

Gly Pro Leu Thr Arg Lys Glu Val Glu Arg Ala Leu Gly Ser Pro Pro 
225 230 235 240 

Gly Thr Gly Val Ala Glu He Ala Ala Ser Leu Ala Arg Gly Lys Thr 
245 250 255 

Ala Glu Ala Leu Gly Leu Ala Arg Arg Leu Tyr Gly Glu Gly Tyr Ala 
260 265 270 

Pro Arg Ser Leu Val Ser Gly Leu Leu Glu Val Phe Arg Glu Gly Leu 
275 280 285 

Tyr Ala Ala Phe Gly Leu Ala Gly Thr Pro Leu Pro Ala Pro Pro Gin 
290 295 300 

Ala Leu lie Ala Ala Met Thr Ala Leu Asp Glu Ala Met Glu Arg Leu 
305 310 315 320 

Ala Arg Arg Ser Asp Ala Leu Ser Leu Glu Val Ala Leu Leu Glu Ala 
325 330 335 



Gly Arg Ala Leu Ala Ala Glu Ala Leu Pro Gin Pro Thr Gly Ala Pro 



340 



345 



350 



Ser Pro Glu Val 
355 

Pro Arg Pro Glu 
370 

Leu Glu Ala Leu 
385 

Pro Glu Val Arg 



Ala Phe His Tyr 
420 

Leu Ala Gin Ala 
435 



Gly Pro Lys Pro 
360 

Glu Ala Pro Asp 
375 

Arg Pro Thr Leu 
390 

Glu Gly Gin Leu 
405 

Arg Lys Ala Ser 



His Phe Gly Val 
440 



Glu Ser Pro Pro 



Leu Arg Glu Arg 
380 

Arg Ala Phe Val 
395 

Cys Leu Ala Phe 
410 

Glu Gin Lys Val 
425 

Glu Glu Val Val 



Thr Pro Glu Pro 
365 

Trp Arg Ala Phe 



Arg Glu Ala Arg 
400 

Pro Glu Asp Lys 
415 

Arg Leu Leu Pro 
430 

Leu Val Leu Glu 
445 



Gly Glu Lys Lys Lys Ala 
450 



<210> 6 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 6 

cgcaagcttc acgcstacct sttctccggs ac 3 2 



<210> 7 
<211> 8 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: peptide 



<400> 7 

His Ala Tyr Leu Phe Ser Gly Thr 
1 5 



<210> 8 
<211> 34 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> Description of Artificial Sequence: primer 
<400> 8 

cgcgaattcg tgctcsggsg gctcctcsag sgtc 34 



<210> 9 
<211> 9 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: peptide 
<400> 9 

Lys Thr Leu Glu Glu Pro Pro Glu His 
1 5 



<210> 10 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 10 

gcgcggatcc ggagggagaa aaaaaaagcc tcagccca 3 8 



<210> 11 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 11 

gcgcggatcc ggagggagag aagaaaagcc tcagccca 3 8 



10 



<210> 12 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 12 

gaattaaatt cgcgcttcgg gaggtggg 



<210> 13 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 13 

gcgcgaattc gcgcttcggg aggtggg 



<210> 14 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 14 

gcgcgaattc gggcgcttca ggaggtggg 



<210> 15 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 15 

gtggtgcata tggtgagcgc cctctaccgc c 
11 



<210> 16 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> IS 

gtggtggtcg acccaggagg gccacctcca g 31 



<210> 17 
<211> 8 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: peptide 
<400> 17 

Gly Xaa Xaa Gly Xaa Gly Lys Thr 
1 5 



<210> 18 
<211> 12 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: peptide 
<400> 18 

Lys Pro Asp Pro Lys Ala Pro Pro Gly Pro Thr Ser 
15 10 



<210> 19 
<211> 180 
<212> PRT 

<213> Escherichia coli 
<400> 19 

Met Ser Tyr Gin Val Leu Ala Arg Lys Trp Arg Pro Gin Thr Phe Ala 
15 10 15 



12 



Asp Val Val Gly Gin Glu His Val Leu Thr Ala Leu Ala Asn Gly Leu 
20 25 30 



Ser Leu Gly Arg lie 
35 

Val Gly Lys Thr Ser 
50 

Glu Thr Gly lie Thr 
65 

Glu He Glu Gin Gly 
85 

Ser Arg Thr Lys Val 
100 

Tyr Ala Pro Ala Arg 
115 

His Met Leu Ser Arg 
130 

Glu Pro Pro Glu His 
145 

Lys Leu Pro Val Thr 
165 



His His Ala Tyr Leu Phe 
40 

lie Ala Arg Leu Leu Ala 
55 

Ala Thr Pro Cys Gly Val 
70 75 

Arg Phe Val Asp Leu He 
90 

Glu Asp Thr Arg Asp Leu 
105 

Gly Arg Phe Lys Val Tyr 
120 

His Ser Phe Asn Ala Leu 
135 

Val Lys Phe Leu Leu Ala 
150 155 

He Leu Ser Arg Cys Leu 
170 



Ser Gly Thr Arg Gly 
45 

Lys Gly Leu Asn Cys 
60 

Cys Asp Asn Cys Arg 
80 

Glu lie Asp Ala Ala 
95 

Leu Asp Asn Val Gin 
110 

Leu He Asp Glu Val 
125 

Leu Lys Thr Leu Glu 
140 

Thr Thr Asp Pro Gin 
160 

Gin Phe His Leu Lys 
175 



Ala Leu Asp Val 
180 



<210> 20 
<211> 180 
<212> PRT 

<213> Bacillus subtilis 
<400> 20 

Met Ser Tyr Gin Ala Leu Tyr Arg Val Phe Arg Pro Gin Arg Phe Glu 
15 10 15 

Asp Val Val Gly Gin Glu His He Thr Lys Thr Leu Gin Asn Ala Leu 
20 25 30 

Leu Gin Lys Lys Phe Ser His Ala Tyr Leu Phe Ser Gly Pro Arg Gly 

13 



Thr Gly Lys Thr Ser Ala Ala Lys lie Phe Ala Lys Ala Val Asn Cys 



Glu His Ala Pro Val Asp Glu Pro Cys Asn Glu Cys Ala Ala Cys Lys 
65 70 75 80 

Gly lie Thr Asn Gly Ser lie Ser Asp Val lie Glu lie Asp Ala Ala 



Ser Asn Asn Gly Val Asp Glu lie Arg Asp lie Arg Asp Lys Val Lys 
100 105 110 

Phe Ala Pro Ser Ala Val Thr Tyr Lys Val Tyr lie lie Asp Glu Val 
115 120 125 

His Met Leu Ser lie Gly Ala Phe Asn Ala Leu Leu Lys Thr Leu Glu 
130 135 140 

Glu Pro Pro Glu His Cys lie Phe lie Leu Ala Thr Thr Glu Pro His 
145 150 155 160 

Lys lie Pro Leu Thr lie lie Ser Arg Cys Gin Arg Phe Asp Phe Lys 
165 170 175 

Arg lie Thr Ser 
180 



<210> 21 
<211> 294 
<212> PRT 

<213> Escherichia coli 
<400> 21 

Met Ser Tyr Gin Val Leu Ala Arg Lys Trp Arg Pro Gin Thr Phe Ala 
15 10 15 

Asp Val Val Gly Gin Glu His Val Leu Thr Ala Leu Ala Asn Gly Leu 
20 25 30 

Ser Leu Gly Arg lie His His Ala Tyr Leu Phe Ser Gly Thr Arg Gly 
35 40 45 

Val Gly Lys Thr Ser lie Ala Arg Leu Leu Ala Lys Gly Leu Asn Cys 
50 55 60 



14 



Glu Thr Gly lie Thr Ala Thr Pro Cys Gly Val Cys Asp Asn Cys Arg 
65 70 75 80 



Glu Xle Glu Gin Gly Arg Phe Val Asp Leu lie Glu lie Asp Ala Ala 
85 90 95 

Ser Arg Thr Lys Val Glu Asp Thr Arg Asp Leu Leu Asp Asn Val Gin 
100 105 110 

Tyr Ala Pro Ala Arg Gly Arg Phe Lys Val Tyr Leu lie Asp Glu Val 
115 120 125 

His Met Leu Ser Arg His Ser Phe Asn Ala Leu Leu Lys Thr Leu Glu 
130 135 140 

Glu Pro Pro Glu His Val Lys Phe Leu Leu Ala Thr Thr Asp Pro Gin 
145 150 155 160 

Lys Leu Pro Val Thr lie Leu Ser Arg Cys Leu Gin Phe His Leu Lys 
165 170 175 

Ala Leu Asp Val Glu Gin lie Arg His Gin Leu Glu His lie Leu Asn 
180 185 190 

Glu Glu His lie Ala His Glu Pro Arg Ala Leu Gin Leu Leu Ala Arg 
195 200 205 

Ala Ala Glu Gly Ser Leu Arg Asp Ala Leu Ser Leu Thr Asp Gin Ala 
210 215 220 

He Ala Ser Gly Asp Gly Gin Val Ser Thr Gin Ala Val Ser Ala Met 
225 230 235 240 

Leu Gly Thr Leu Asp Asp Asp Gin Ala Leu Ser Leu Val Glu Ala Met 
245 250 255 

Val Glu Ala Asn Gly Glu Arg Val Met Ala Leu He Asn Glu Ala Ala 
260 265 270 

Ala Arg Gly He Glu Trp Glu Ala Leu Leu Val Glu Met Leu Gly Leu 
275 280 285 

Leu His Arg lie Ala Met 
290 



<210> 22 
<211> 294 



<212> PRT 

<213> Haemophilus influenzae 



<400> 22 

Met Ser Tyr Gin Val Leu Ala Arg Lys Trp Arg Pro Lys Thr Phe Ala 
15 10 15 

Asp Val Val Gly Gin Glu His He He Thr Ala Leu Ala Asn Gly Leu 



Lys Asp Asn Arg Leu His His Ala Tyr Leu Phe Ser Gly Thr Arg Gly 
35 40 45 

Val Gly Lys Thr Ser lie Ala Arg Leu Phe Ala Lys Gly Leu Asn Cys 
50 55 60 

Val His Gly Val Thr Ala Thr Pro Cys Gly Glu Cys Glu Asn Cys Lys 



Ala He Glu Gin Gly Asn Phe He Asp Leu He Glu He Asp Ala Ala 
85 90 95 

Ser Arg Thr Lys Val Glu Asp Thr Arg Glu Leu Leu Asp Asn Val Gin 
100 105 110 

Tyr Lys Pro Val Val Gly Arg Phe Lys Val Tyr Leu He Asp Glu Val 
115 120 125 

His Met Leu Ser Arg His Ser Phe Asn Ala Leu Leu Lys Thr Leu Glu 
130 135 140 

Glu Pro Pro Glu Tyr Val Lys Phe Leu Leu Ala Thr Thr Asp Pro Gin 
145 150 155 160 

Lys Leu Pro Val Thr He Leu Ser Arg Cys Leu Gin Phe His Leu Lys 
165 170 175 

Ala Leu Asp Glu Thr Gin He Ser Gin His Leu Ala His He Leu Thr 
180 185 190 

Gin Glu Asn He Pro Phe Glu Asp Pro Ala Leu Val Lys Leu Ala Lys 
195 200 205 

Ala Ala Gin Gly Ser He Arg Asp Ser Leu Ser Leu Thr Asp Gin Ala 
210 215 220 

He Ala Met Gly Asp Arg Gin Val Thr Asn Asn Val Val Ser Asn Met 
225 230 235 240 
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Leu Gly Leu Leu Asp Asp Asn Tyr Ser Val Asp lie Leu Tyr Ala Leu 
245 250 255 



His Gin Gly Asn Gly Glu Leu Leu Met Arg Thr Leu Gin Arg Val Ala 
260 265 270 

Asp Ala Ala Gly Asp Trp Asp Lys Leu Leu Gly Glu Cys Ala Glu Lys 
275 280 285 

Leu His Gin lie Ala Leu 
290 



<210> 23 
<211> 294 
<212> PRT 

<213> Bacillus subtilis 
<400> 23 

Met Ser Tyr Gin Ala Leu Tyr Arg Val Phe Arg Pro Gin Arg Phe Glu 
15 10 15 

Asp Val Val Gly Gin Glu His lie Thr Lys Thr Leu Gin Asn Ala Leu 
20 25 30 

Leu Gin Lys Lys Phe Ser His Ala Tyr Leu Phe Ser Gly Pro Arg Gly 
35 40 45 

Thr Gly Lys Thr Ser Ala Ala Lys lie Phe Ala Lys Ala Val Asn Cys 
50 55 60 

Glu His Ala Pro Val Asp Glu Pro Cys Asn Glu Cys Ala Ala Cys Lys 
65 70 75 80 

Gly lie Thr Asn Gly Ser lie Ser Asp Val lie Glu lie Asp Ala Ala 
85 90 95 

Ser Asn Asn Gly Val Asp Glu lie Arg Asp lie Arg Asp Lys Val Lys 
100 105 110 

Phe Ala Pro Ser Ala Val Thr Tyr Lys Val Tyr lie lie Asp Glu Val 
115 120 125 

His Met Leu Ser lie Gly Ala Phe Asn Ala Leu Leu Lys Thr Leu Glu 
130 135 140 

Glu Pro Pro Glu His Cys lie Phe lie Leu Ala Thr Thr Glu Pro His 
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145 



150 



155 



160 



Lys He Pro Leu 



Arg He Thr Ser 
180 

Ala Glu Gin Leu 
195 

Ala Ala His Gly 
210 

lie Ser Phe Ser 
225 

Thr Gly Ala Val 



His Asp Lys Asn 
260 

Gin Gin Gly Lys 
275 

Phe Arg Asp Met 
290 



Thr lie He Ser 
165 

Gin Ala He Val 



Gin Val Glu Glu 
200 

Gly Met Arg Asp 
215 

Gly Asp He Leu 
230 

Ser Gin Leu Tyr 
245 

Val Ser Asp Ala 



Asp Pro Ala Lys 
280 

Leu Leu 



Arg Cys Gin Arg 
170 

Gly Arg Met Asn 
185 

Gly Ser Leu Glu 



Ala Leu Ser Leu 
220 

Lys Val Glu Asp 
235 

He Gly Lys Leu 
250 

Leu Glu Thr Leu 
265 

Leu He Glu Asp 



Phe Asp Phe Lys 
175 

Lys He Val Asp 
190 

He He Ala Ser 
205 

Leu Asp Gin Ala 



Ala Leu Leu He 
240 

Ala Lys Ser Leu 
255 

Asn Glu Leu Leu 
270 

Met He Phe Tyr 
285 



<210> 24 
<211> 300 
<212> PRT 

<213> Caulobacter crescentus 
<400> 24 

Asp Ala Tyr Thr Val Leu Ala Arg Lys Tyr Arg Pro Arg Thr Phe Glu 
15 10 15 

Asp Leu He Gly Gin Glu Ala Met Val Arg Thr Leu Ala Asn Ala Phe 
20 25 30 

Ser Thr Gly Arg He Ala His Ala Phe Met Leu Thr Gly Val Arg Gly 
35 40 45 

Val Gly Lys Thr Thr Thr Ala Arg Leu Leu Ala Arg Ala Leu Asn Tyr 
50 55 60 
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Glu Thr Asp Thr Val Lys Gly Pro Ser Val Asp Leu Thr Thr Glu Gly 
65 70 75 80 



Tyr His Cys Arg Ser lie lie Glu Gly Arg His Met Asp Val Leu Glu 



Leu Asp Ala Ala Ser Arg Thr Lys Val Asp Glu Met Arg Glu Leu Leu 
100 105 110 

Asp Gly Val Arg Tyr Ala Pro Val Glu Ala Arg Tyr Lys Val Tyr lie 
115 120 125 

lie Asp Glu Val His Met Leu Ser Thr Ala Ala Phe Asn Ala Leu Leu 
130 135 140 

Lys Thr Leu Glu Glu Pro Pro Pro His Ala Lys Phe lie Phe Ala Thr 
145 150 155 ISO 

Thr Glu He Arg Lys Val Pro Val Thr He Leu Ser Arg Cys Gin Arg 
165 170 175 

Phe Asp Leu Arg Arg Val Glu Pro Asp Val Leu Val Lys His Phe Asp 
180 185 190 

Arg He Ser Ala Lys Glu Gly Ala Arg He Glu Met Asp Ala Leu Ala 
195 200 205 

Leu lie Ala Arg Ala Ala Glu Gly Ser Val Arg Asp Gly Leu Ser Leu 
210 215 220 

Leu Asp Gin Ala He Val Gin Thr Glu Arg Gly Gin Thr Val Thr Ser 
225 230 235 240 

Thr Val Val Arg Asp Met Leu Gly Leu Ala Asp Arg Ser Gin Thr He 
245 250 255 

Ala Leu Tyr Glu His Val Met Ala Gly Lys Thr Lys Asp Ala Leu Glu 
260 265 270 

Gly Phe Arg Ala Leu Trp Gly Phe Gly Ala Asp Pro Ala Val Val Met 
275 280 285 

Leu Asp Val Leu Asp His Cys His Ala Ser Ala Val 
290 295 300 



<210> 25 
<211> 260 
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<212> PRT 

<213> Mycoplasma genital ium 
<400> 25 

Met His Gin Val Phe Tyr Gin Lys Tyr Arg Pro lie Asn Phe Lys Gin 
15 10 15 

Thr Leu Gly Gin Glu Ser lie Arg Lys lie Leu Val Asn Ala lie Asn 
20 25 30 

Arg Asp Lys Leu Pro Asn Gly Tyr lie Phe Ser Gly Glu Arg Gly Thr 
35 40 45 

Gly Lys Thr Thr Phe Ala Lys lie lie Ala Lys Ala lie Asn Cys Leu 



Asn Trp Asp Gin lie Asp Val Cys Asn Ser Cys Asp Val Cys Lys Ser 
65 70 75 80 

lie Asn Thr Asn Ser Ala lie Asp lie Val Glu lie Asp Ala Ala Ser 
85 90 95 

Lys Asn Gly lie Asn Asp lie Arg Glu Leu Val Glu Asn Val Phe Asn 
100 105 110 

His Pro Phe Thr Phe Lys Lys Lys Val Tyr lie Leu Asp Glu Ala His 
115 120 125 

Met Leu Thr Thr Gin Ser Trp Gly Gly Leu Leu Lys Thr Leu Glu Glu 
130 135 140 

Ser Pro Pro Tyr Val Leu Phe lie Phe Thr Thr Thr Glu Phe Asn Lys 
145 150 155 160 

lie Pro Leu Thr lie Leu Ser Arg Cys Gin Ser Phe Phe Phe Lys Lys 
165 170 175 

lie Thr Ser Asp Leu lie Leu Glu Arg Leu Asn Asp lie Ala Lys Lys 
180 185 190 

Glu Lys lie Lys lie Glu Lys Asp Ala Leu lie Lys lie Ala Asp Leu 
195 200 205 

Ser Gin Gly Ser Leu Arg Asp Gly Leu Ser Leu Leu Asp Gin Leu Ala 
210 215 220 

lie Ser Leu lie Val Lys Lys Leu Val Leu Leu Met Leu Lys Lys His 
225 230 235 240 



Leu He Ser Leu He Glu Met Gin Asn Leu Leu Leu Leu Lys Gin Phe 
245 250 255 



Tyr Gin Glu lie 
260 



<210> 26 
<211> 289 
<212> PRT 

<213> Thermus thermophilus 
<400> 26 

Val Ser Ala Leu Tyr Arg Arg Phe Arg Pro Leu Thr Phe Gin Glu Val 
15 10 15 

Val Gly Gin Glu His Val Lys Glu Pro Leu Leu Lys Ala lie Arg Glu 
20 25 30 

Gly Arg Leu Ala Gin Ala Tyr Leu Phe Ser Gly Pro Arg Gly Val Gly 
35 40 45 

Lys Thr Thr Thr Ala Arg Leu Leu Ala Met Ala Val Gly Cys Gin Gly 
50 55 60 

Glu Asp Pro Pro Cys Gly Val Cys Pro His Cys Gin Ala Val Gin Arg 
65 70 75 80 

Gly Ala His Pro Asp Val Val Asp He Asp Ala Ala Ser Asn Asn Ser 
85 90 95 

Val Glu Asp Val Arg Glu Leu Arg Glu Arg He His Leu Ala Pro Leu 
100 105 110 

Ser Ala Pro Arg Lys Val Phe lie Leu Asp Glu Ala His Met Leu Ser 
115 120 125 

Lys Ser Ala Phe Asn Ala Leu Leu Lys Thr Leu Glu Glu Pro Pro Pro 
130 135 140 

His Val Leu Phe Val Phe Ala Thr Thr Glu Pro Glu Arg Met Pro Pro 
145 150 155 160 

Thr lie Leu Ser Arg Thr Gin His Phe Arg Phe Arg Arg Leu Thr Glu 
165 170 175 

Glu Glu He Ala Phe Lys Leu Arg Arg He Leu Glu Ala Val Gly Arg 
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180 



185 



190 



Glu Ala Glu Glu Glu Ala Leu Leu Leu Leu Ala Arg Leu Ala Asp Gly 
195 200 205 

Ala Leu Arg Asp Ala Glu Ser Leu Leu Glu Arg Phe Leu Leu Leu Glu 
210 215 220 

Gly Pro Leu Thr Arg Lys Glu Val Glu Arg Ala Leu Gly Ser Pro Pro 
225 230 235 240 

Gly Thr Gly Val Ala Glu lie Ala Ala Ser Leu Ala Arg Gly Lys Thr 
245 250 255 

Ala Glu Ala Leu Gly Leu Ala Arg Arg Leu Tyr Gly Glu Gly Tyr Ala 
260 265 270 

Pro Arg Ser Leu Val Ser Gly Leu Leu Glu Val Phe Arg Glu Gly Leu 
275 280 285 

Tyr 



<210> 27 
<211> 94 
<212> DNA 

<213> Thermus thermophilus 
<400> 27 

gccggaggga gaaaaaaaaa gccgagccca aggccccgcc cggccccacc ccgaagcgcc SO 
cgcacccccg ggccccccga ggaggaggag aggc 94 

<210> 28 
<211> 11 
<212> PRT 

<213> Thermus thermophilus 
<400> 28 

Val Leu Glu Gly Glu Lys Lys Ser Leu Ser Pro 
15 10 



<210> 29 
<211> 23 
<212> DNA 

<213> Artificial Sequence 

22 



<220> 

<223> Description of Artificial Sequence: primer 
<400> 29 

cacgcntacc tnttctccgg nac 2 3 



<210> 30 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequen 
<400> 30 

gtgctcnggn ggctcctcnt cngtc 



<210> 31 
<211> 33 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<400> 31 

gtgggatccg tggttctgga tctcgatgaa gaa 



<210> 32 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 32 

gtgggatcca cggsctstcs gagcagaag 



<210> 33 
<211> 34 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 



<400> 33 

gcgggatcct caacgaggac ctctccatct tcaa 

<210> 34 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 34 

gcgggatcct tgtcgtcsag sgtsagsgcg tcgta 

<210> 35 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 35 

gggaaggacc agcgcgtact ccccctgctc ctaggtgtg 



<210> 36 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 3S 

gtgtggatcc ttcttcttsc ccatsgc 



<210> 37 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 



<400> 37 

caccgattcc agtggtgcct aggtgtg 

<210> 38 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 38 

caacacctgg tgttccagga gcctgtgctt 



<210> 39 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 39 

ccagaatcgt ctgctggtcg tag 



<210> 40 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 40 

agcaccctgg aggagcttc 



<210> 41 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 
<400> 41 

catgtcgtac tgggtgtac 



<210> 42 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<2 21> unsure 
<222> (7) 

<223> N at any position in this sequence is A, C, G, or 
T 

<400> 42 

gtsgtsnnsg acnnsgagac sacsggg 

<210> 43 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> unsure 
<222> (8) 

<223> N at any position in this sequence is A, C, G, or 
T 

<400> 43 

gaasccsnng tcgaasnngg cgttgtg 



<210> 44 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 



<400> 44 

cggggatcca cctcaatcac ctcgtgg 

<210> 45 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 45 

cggggatccg ccaccttgcg gctccgggtg 



<210> 46 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 46 

gcgctctaga cgagttccca aagcgtgcgg t 



<210> 47 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 47 

cgcgtctaga tcacctgtat ccaga 



<210> 48 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 



<400> 48 

gcggcgcata tggtggtggt cctggacctg gag 



<210> 49 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 49 

cgcgtctaga tcacctgtat ccaga 



<210> 50 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 50 

gtsctsgtsa agacscactt 



<210> 51 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 51 

sagsagsgcg ttgaasgtgt g 



<210> 52 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 
<400> 52 

ctcgttggtg aaagtttccg tg 

<210> 53 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 53 

ctcgttggtg aaagtttccg tg 



<210> 54 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 54 

tctggcaaca cgttctggag cacatcc 



<210> 55 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 55 

tgctggcgtt catcttcagg atg 



<210> 55 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description o£ Artificial Sequence: primer 
<400> 56 

catcctgaag atgaacgcca gca 



<210> 57 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 57 

aggttatcca caggggtcat gtgca 



<210> 58 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 58 

gtgtgtcata tgaacataac ggttcccaa 



<210> 59 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 59 

gcgcgaattc tcccttgtgg aaggcttag 



<210> 60 
<211> 13 
<212> PRT 

<213> Thermus thermophilus 



30 



<400> 60 

Arg Val Glu Leu Asp Tyr Asp Ala Leu Thr Leu Asp Asp 
15 10 



<210> 61 
<211> 14 
<212> PRT 

<213> Thermus thermophilus 
<400> 61 

Phe Phe lie Glu lie Gin Asn His Gly Leu Ser Glu Gin Lys 
15 10 



<210> 62 
<211> 8 
<212> PRT 

<213> Thermus thermophilus 
<400> 62 

Phe Phe lie Glu lie Gin Asn His 
1 5 



<210> 63 
<211> 8 
<212> PRT 

<213> Thermus thermophilus 
<400> 63 

Tyr Asp Ala Leu Thr Leu Asp Asp 
1 5 



<210> 64 
<211> 6 
<212> PRT 

<213> Thermus thermophilus 
<400> 64 

Ala Met Gly Lys Lys Lys 
1 5 



<210> 65 
<211> 9 
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<212> PRT 

<213> Thermus thermophilus 
<400> 65 

Phe Asn Lys Ser His Ser Ala Ala Tyr 
1 5 



<210> 66 
<211> 9 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: peptide 
<400> 66 

Val Val Xaa Asp Xaa Glu Thr Thr Gly 
1 5 



<210> 67 
<211> 9 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: peptide 
<400> 67 

His Asn Ala Xaa Phe Asp Xaa Gly Phe 
1 5 



<210> 68 
<211> 9 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: peptide 
<400> 68 

Val Val Xaa Asp Xaa Glu Thr Thr Gly 
1 5 



<210> 69 
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<211> 7 
<212> PRT 

<213> Thermus thermophilus 
<400> 69 

Val Leu Val Lys Thr His Leu 
1 5 



<210> 70 
<211> 6 
<212> PRT 

<213> Artificial Sequence 
<220> 

<22 3> Description of Artificial Sequence: peptide 
<400> 70 

His Arg Ala Leu Tyr Asp 

1 5 



<210> 71 
<211> 7 
<212> PRT 

<213> Thermus thermophilus 
<400> 71 

His Thr Phe Asn Ala Leu Leu 
1 5 



<210> 72 
<211> 34 
<212> PRT 

<213> Escherichia coli 
<400> 72 

Asp Arg Tyr Phe Leu Glu Leu lie Arg Thr Gly Arg Pro Asp Glu Glu 
15 10 15 

Ser Tyr Leu His Ala Ala Val Glu Leu Ala Glu Ala Arg Gly Leu Pro 
20 25 30 

Val Val 
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<210> 73 
<211> 34 
<212> PRT 

<213> Vibrio cholerae 
<400> 73 

Asp His Phe Tyr Leu Glu Leu He Arg Thr Gly Arg Ala Asp Glu Glu 
15 10 15 

Ser Tyr Leu His Phe Ala Leu Asp Val Ala Glu Gin Tyr Asp Leu Pro 
20 25 30 

Val Val 



<210> 74 
<211> 34 
<212> PRT 

<213> Haemophilus influenzae 
<400> 74 

Asp His Phe Tyr Leu Ala Leu Ser 
1 5 

Arg Tyr He Gin Ala Ala Leu Lys 
20 

Leu Val 



Arg Thr Gly Arg Pro Asn Glu Glu 
10 15 

Leu Ala Glu Arg Cys Asp Leu Pro 
25 30 



<210> 75 
<211> 34 
<212> PRT 

<213> Rickettsia prowazekii 
<400> 75 

Asp Arg Phe Tyr Phe Glu lie Met Arg His Asp Leu Pro Glu Glu Gin 
15 10 15 

Phe lie Glu Asn Ser Tyr He Gin He Ala Ser Glu Leu Ser He Pro 
20 25 30 

He Val 



34 



<210> 76 
<211> 34 
<212> PRT 

<213> Helicobacter pylori 
<400> 76 

Asp Asp Phe Tyr Leu Glu lie Met Arg His Gly lie Leu Asp Gin Arg 
15 10 15 

Phe lie Asp Glu Gin Val lie Lys Met Ser Leu Glu Thr Gly Leu Lys 
20 25 30 

He He 



<210> 77 
<211> 34 
<212> PRT 

<213> Synechocystis sp . 
<400> 77 

Asp Asp Tyr Tyr Leu Glu He Gin Asp His Gly Ser Val Glu Asp Arg 
15 10 15 

Leu Val Asn He Asn Leu Val Lys He Ala Gin Glu Leu Asp He Lys 
20 25 30 

He Val 



<210> 78 
<211> 34 
<212> PRT 

<213> Mycobacterium tuberculosis 
<400> 78 

Asp Asn Tyr Phe Leu Glu Leu Met Asp His Gly Leu Thr He Glu Arg 
15 10 15 

Arg Val Arg Asp Gly Leu Leu Glu He Gly Arg Ala Leu Asn He Pro 
20 25 30 

Pro Leu 
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<210> 79 
<211> 46 
<212> PRT 

<213> Escherichia coli 
<400> 79 

Asn Lys Arg Arg Ala Lys Asn Gly 
1 5 

He Pro Leu Asp Asp Lys Lys Ser 
20 

Thr Thr Ala Val Phe Gin Leu Glu 
35 40 



Glu Pro Pro Leu Asp He Ala Ala 
10 15 

Phe Asp Met Leu Gin Arg Ser Glu 
25 30 

Ser Arg Gly Met Lys Asp 
45 



<210> 80 
<211> 46 
<212> PRT 

<213> Vibrio cholerae 
<400> 80 

Asn Pro Arg Leu Lys Lys Ala Gly Lys Pro Pro Val Arg He Glu Ala 
15 10 15 

He Pro Leu Asp Asp Ala Arg Ser Phe Arg Asn Leu Gin Asp Ala Lys 
20 25 30 

Thr Thr Ala Val Phe Gin Leu Glu Ser Arg Gly Met Lys Glu 
35 40 45 



<210> 81 
<211> 46 
<212> PRT 

<213> Haemophilus influenzae 
<400> 81 

Asn Val Arg Met Val Arg Glu Gly 
1 5 

He Pro Leu Asp Asp Pro Glu Ser 
20 

Thr Thr Ala Val Phe Gin Leu Glu 
35 40 



Lys Pro Arg Val Asp He Ala Ala 
10 15 

Phe Glu Leu Leu Lys Arg Ser Glu 
25 30 

Ser Arg Gly Met Lys Asp 
45 
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<210> 82 
<211> 46 
<212> PRT 

<213> Rickettsia prowazekii 
<400> 82 

Cys Lys Lys Leu Leu Lys Glu Gin 
1 5 

Met Thr Phe Asp Asp Lys Lys Thr 
20 

Gly Val Gly Val Phe Gin Phe Glu 

35 40 



Gly He Lys He Asp Phe Asp Asp 
10 15 

Tyr Gin Met Leu Cys Lys Gly Lys 
25 30 

Ser He Gly Met Lys Asp 
45 



<210> 83 
<211> 45 
<212> PRT 

<213> Helicobacter pylori 
<400> 83 

Leu Lys He He Lys Thr Gin His 

1 5 

Leu Asp Met Asp Asp Pro Lys Val 
20 

Thr Val Gly lie Phe Gin He Glu 
35 40 



Lys He Ser Val Asp Phe Leu Ser 
10 15 

Tyr Lys Thr He Gin Ser Gly Asp 
25 30 

Ser Gly Met Phe Gin 
45 



<210> 84 
<211> 46 
<212> PRT 

<213> Synechocystis sp. 
<400> 84 

Gin Glu Arg Lys Ala Leu Gin He Arg Ala Arg Thr Gly Ser Lys Lys 
15 10 15 

Leu Pro Asp Asp Val Lys Lys Thr His Lys Leu Leu Glu Ala Gly Asp 
20 25 30 

Leu Glu Gly He Phe Gin Leu Glu Ser Gin Gly Met Lys Gin 
35 40 45 
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<210> 85 
<211> 46 
<212> PRT 

<213> Mycobacterium tuberculosis 
<400> 85 

lie Asp Asn Val Arg Ala Asn Arg Gly lie Asp Leu Asp Leu Glu Ser 
15 10 15 

Val Pro Leu Asp Asp Lys Ala Thr Tyr Glu Leu Leu Gly Arg Gly Asp 
20 25 30 

Thr Leu Gly Val Phe Gin Leu Asp Gly Gly Pro Met Arg Asp 
35 40 45 



<210> 86 
<211> 3729 
<212> DNA 

<213> Thermus thermophi lus 
<400> 86 

atgggccggg agctccgctt cgcccacctc 
ggggcggcga agctttccga cctcctcaag 
gccttggcca tgaccgacca cggcaacctc 
accgaaatgg gcatcaagcc catcctgggc 
tttgaccgca agcggggaaa gggcctagac 
aaggacttca cggggtacca gaacctggtg 
ttttacgaaa agccccggat tgaccgggag 
gccctctcgg ggtgcctcgg ggcggagatc 
ctggccgagg cccggctcaa cgagtacctc 
atccagaacc acggcctccc cgagcagaaa 
cgaaagtacg gcctggggat ggtggccacc 
gcccgcgccc acgaggtcct cctcgccatc 
cgctggcgct tcccctgcga cgagttctac 
ttccccgagg aggagtgggg ggacgagccc 
tgcaacgtgg agctgcccat cggggacaag 
cccgaggggc ggaccgaggc ccagtacctc 
cgctacccgg accggatcac cgagggcttc 
cttccccccc acggggacgg ggaggccttg 
gcttgggaga ggctcatgaa gagcctcccc 
gaggccattt tccaccgggc cctttacgag 
ggctacttcc tcatcgtcca ggactacatc 
gggcccggca gggggagcgc cgccgggagc 
attgaccccc tccgcttcgg cctcctcttt 
atgcccgaca ttgacacgga cttctccgac 
cgggagcgct acggcgagga caaggtggcc 
aaggccgccc tcaaggacgt ggcccgggtc 



caccagcaca cccagttctc cctcctggac 60 
tgggtcaagg agacgacccc cgaggacccc 12 0 
ttcggggccg tggagttcta caagaaggcc 180 
tacgaggcct acgtggcggc ggaaagccgc 240 
gggggctact ttcacctcac cctcctcgcc 300 
cgcctggcga gccgggctta cctggagggg 3 60 
atcctgcgcg agcacgccga gggcctcatc 42 0 
ccccagttca tcctccagga ccgtctggac 4 80 
tccatcttca aggaccgctt cttcatcgag 540 
aaggtcaacg aggtcctcaa ggagttcgcc 6 00 
aacgacggcc attacgtgag gaaggaggac 660 
cagtccaaga gcaccctgga cgaccccggg 720 
gtgaagaccc ccgaggagat gcgggccatg 7 80 
tttgacaaca ccgtggagat cgcccgcatg 84 0 
atggtctacc gaatcccccg cttccccctc 900 
atggagctca ccttcaaggg gctcctccgc 960 
taccgggagg tcttccgcct tttggggaag 1020 
gccgaggcct tggcccaggt ggagcgggag 1080 
cctttggccg gggtcaagga gtggacggcg 114 0 
ctttccgtga tagagcgcat ggggtttccc 12 0 0 
aactgggccc ggagaaacgg cgtctccgtg 12 6 0 
ctggtggcct acgccgtggg gatcaccaac 132 0 
gagcgcttcc tgaacccgga gagggtctcc 1380 
cgggagcggg accgggtgat ccagtacgtg 144 0 
cagatcggca ccctgggaag cctcgcctcc 1500 
tacggcatcc cccacaagaa ggcggaggaa 1560 
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ttggccaagc tcatcccggt gcagttcggg aagcccaagc ccctgcagga ggccatccag 1620 
gtggtgccgg agcttagggc ggagatggag aaggacccca aggtgcggga ggtcctcgag 1680 
gtggccatgc gcctggaggg cctgaaccgc cacgcctccg tccacgccgc cggggtggtg 17 4 0 
atcgccgccg agcocctcac ggacctcgtc cccctcatgc gcgaccagga agggcggccc 180 0 
gtcacccagt acgacatggg ggcggtggag gccttggggc ttttgaagat ggactttttg 186 0 
ggcctccgca ccctcacctt cctggacgag gtcaagcgca tcgtcaaggc gtcccagggg 192 0 
gtggagctgg actacgatgc cctccccctg gacgacccca agaccttcgc cctcctctcc 1980 
cggggggaga ccaagggggt cttccagctg gagtcggggg ggatgaccgc cacgctccgc 2 04 0 
ggcctcaagc cgcggcgctt tgaggacctg atcgccatcc tctccctcta ccgccccggg 2100 
cccatggagc acatccccac ctacatccgc cgccaccacg ggctggagcc cgtgagctac 216 0 
agcgagtttc cccacgccga gaagtaccta aagcccatcc tggacgagac ctacggcatc 2220 
cccgtctacc aggagcagat catgcagatc gcctcggccg tggcggggta ctccctgggc 2280 
gaggcggacc tcctgcggcg gtccatgggc aagaagaagg tggaggagat gaagtcccac 2 34 0 
cgggagcgct tcgtccaggg ggccaaggaa aggggcgtgc ccgaggagga ggccaaccgc 24 0 0 
ctctttgaca tgctggaggc cttcgccaac tacggcttca acaaatccca cgctgccgcc 24 60 
tacagcctcc tctcctacca gaccgcctac gtgaaggccc actaccccgt ggagttcatg 2520 
gccgccctcc tctccgtgga gcggcacgac tccgacaagg tggccgagta catccgcgac 2 5 80 
gcccgggcca tgggcataga ggtccttccc ccggacgtca accgctccgg gtttgacttc 2 640 
ctggtccagg gccggcagat ccttttcggc ctctccgcgg tgaagaacgt gggcgaggcg 2700 
gcggcggagg ccattctccg ggagcgggag cggggcggcc cctaccggag cctcggcgac 27 60 
ttcctcaagc ggctggacga gaaggtgctc aacaagcgga ccctggagtc cctcatcaag 2820 
gcgggcgccc tggacggctt cggggaaagg gcgcggctcc tcgcctccct ggaagggctc 2880 
ctcaagtggg cggccgagaa ccgggagaag gcccgctcgg gcatgatggg cctcttcagc 294 0 
gaagtggagg agccgccttt ggccgaggcc gcccccctgg acgagatcac ccggctccgc 300 0 
tacgagaagg aggccctggg gatctacgtc tceggccacc ccatcttgcg gtaccccggg 3 0 60 
ctccgggaga cggccacctg caccctggag gagcttcccc acctggcccg ggacctgccg 312 0 
ccccggtcta gggtcctcct tgccgggatg gtggaggagg tggtgcgcaa gcccacaaag 318 0 
agcggcggga tgatggcccg cttcgtcctc tccgacgaga cgggggcgct tgaggcggtg 3240 
gcattcggcc gggcctacga ccaggtctcc ccgaggctca aggaggacac ccccgtgctc 3 30 0 
gtcctcgccg aggtggagcg ggaggagggg ggcgtgcggg tgctggccca ggccgtttgg 3360 
acctacgagg agctggagca ggtcccccgg gccctcgagg tggaggtgga ggcctccctc 3420 
ctggacgacc ggggggtggc ccacctgaaa agcctcctgg acgagcacgc ggggaccctc 3480 
cccctgtacg tccgggtcca gggcgccttc ggcgaggccc tcctcgccct gagggaggtg 3540 
cgggtggggg aggaggctgt aggcggccgc gtggttccgg gcctacctcc tgcccgaccg 3 600 
ggaggtcctt ctccagggcg gccaggcggg ggaggcccag gaggcggtgc ccttctaggg 3660 
ggtgggccgt gagacctagc gccatcgttc tcgccggggg caaggaggcc tgggcccgac 37 20 
cccttttgg 
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<210> 87 
<211> 1245 
<212> PRT 

<213> Thermus thsrmophilus 
<400> 87 

Met Gly Arg Glu Leu Arg Phe Ala His Leu His Gin His Thr Gin Phe 
1 5 10 15 
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Ser Leu Leu Asp Gly Ala Pro Lys Leu Ser Asp Leu Leu Lys Trp Val 
20 25 30 



Glu Glu Thr Thr Pro Glu Asp Pro Ala Leu Ala Met Thr Asp His Gly 
35 40 45 

Asn Leu Phe Gly Ala Val Glu Phe Tyr Lys Lys Ala Thr Glu Met Gly 
50 55 60 

He Lys Pro He Leu Gly Tyr Glu Ala Tyr Val Ala Ala Glu Ser Arg 
65 70 75 80 

Phe Asp Arg Lys Arg Gly Lys Gly Leu Asp Gly Gly Tyr Phe His Leu 



Thr Leu Leu Ala Lys Asp Phe Thr Gly Tyr Gin Asn Leu Val Arg Leu 
100 105 HO 

Ala Ser Arg Ala Tyr Leu Glu Gly Phe Tyr Glu Lys Pro Arg He Asp 
115 120 125 

Arg Glu He Leu Arg Glu His Ala Glu Gly Leu He Ala Leu Ser Gly 
130 135 140 

Cys Leu Gly Ala Glu He Pro Gin Phe He Leu Gin Asp Arg Leu Asp 
145 150 155 160 

Leu Ala Glu Ala Arg Leu Asn Glu Tyr Leu Ser He Phe Lys Asp Arg 
165 170 175 

Phe Phe He Glu He Gin Asn His Gly Leu Pro Glu Gin Lys Lys Val 
180 185 190 

Asn Glu Val Leu Lys Glu Phe Ala Arg Lys Tyr Gly Leu Gly Met Val 
195 200 205 

Ala Thr Asn Asp Gly His Tyr Val Arg Lys Glu Asp Ala Arg Ala His 
210 215 220 

Glu Val Leu Leu Ala He Gin Ser Lys Ser Thr Leu Asp Asp Pro Gly 
225 230 235 240 

Ala Leu Ala Leu Pro Cys Glu Glu Phe Tyr Val Lys Thr Pro Glu Glu 
245 250 255 

Met Arg Ala Met Phe Pro Glu Glu Glu Val Gly Gly Arg Ser Pro Leu 
260 265 270 
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Thr Thr Pro Trp Arg Ser Pro His Val Gin Arg Gly Ala Ala lie Gly 
275 280 285 



Thr Arg Trp Ser Thr Arg He Pro Arg Phe Pro Leu Pro Glu Gly Arg 
290 295 300 

Thr Glu Ala Gin Tyr Leu Met Glu Leu Thr Phe Lys Gly Leu Leu Arg 
305 310 315 320 

Arg Tyr Pro Asp Arg He Thr Glu Gly Phe Tyr Arg Glu Val Phe Arg 
325 330 335 

Leu Ser Gly Lys Leu Pro Pro His Gly Asp Gly Glu Ala Leu Ala Glu 
340 345 350 

Ala Leu Ala Gin Val Glu Arg Glu Ala Trp Glu Arg Leu Met Lys Ser 
355 360 365 

Leu Pro Pro Leu Ala Gly Val Lys Glu Trp Thr Ala Glu Ala lie Phe 
370 375 380 

His Arg Ala Leu Tyr Glu Leu Ser Ala He Glu Arg Met Gly Phe Pro 
385 390 395 400 

Gly Leu Leu Pro His Arg Pro Gly Leu His Gin Leu Gly Pro Glu Lys 
405 410 415 

Gly Val Ser Val Gly Pro Gly Arg Gly Gly Ala Ala Gly Ser Leu Val 
420 425 430 

Ala Tyr Ala Val Gly He Thr Asn He Asp Pro Leu Arg Phe Gly Leu 
435 440 445 

Leu Phe Glu Arg Phe Leu Asn Pro Glu Arg Val Ser Met Pro Asp He 
450 455 460 

Asp Thr Asp Phe Ser Asp Arg Glu Arg Asp Arg Val He Gin Tyr Val 
465 470 475 480 

Arg Glu Arg Tyr Gly Glu Asp Lys Val Ala Gin He Gly Thr Leu Gly 
485 490 495 

Ser Leu Ala Ser Lys Ala Ala Leu Lys Glu Val Ala Arg Val Tyr Gly 
500 505 510 

He Pro Arg Lys Lys Ala Glu Glu Leu Ala Lys Leu He Pro Val Gin 
515 520 525 
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Phe Gly Lys Pro Lys Pro Leu Gin Glu Ala lie Gin Val Val Pro Glu 
530 535 540 

Leu Arg Ala Glu Met Glu Lys Asp Pro Lys Val Arg Glu Val Leu Glu 
545 550 555 560 

Val Ala Met Arg Leu Glu Gly Leu Asn Arg His Ala Ser Val His Ala 
565 570 575 

Gly Arg Gly Gly Val Phe Ser Glu Pro Leu Thr Asp Leu Val Pro Leu 
580 585 590 

Cys Ala Thr Arg Lys Gly Gly Pro Tyr Thr Gin Tyr Asp Met Gly Ala 
595 600 605 

Val Glu Ala Leu Gly Leu Leu Lys Met Asp Phe Leu Gly Leu Arg Thr 
610 615 620 

Leu Thr Phe Leu Asp Glu Val Lys Arg lie Val Lys Ala Ser Gin Gly 
625 630 635 640 

Val Glu Leu Asp Tyr Asp Ala Leu Pro Leu Asp Asp Pro Lys Thr Phe 
645 650 655 

Ala Leu Leu Ser Arg Gly Glu Thr Lys Gly Val Phe Gin Leu Glu Ser 
660 665 670 

Gly Gly Met Thr Ala Thr Leu Arg Gly Leu Lys Pro Arg Arg Phe Glu 
675 680 685 

Asp Leu He Ala He Leu Ser Leu Tyr Arg Pro Gly Pro Met Glu His 
690 695 700 

lie Pro Thr Tyr He Arg Arg His His Gly Leu Glu Pro Val Ser Tyr 
705 710 715 720 

Ser Glu Phe Pro His Ala Glu Lys Tyr Leu Lys Pro He Leu Asp Glu 
725 730 735 

Thr Tyr Gly He Pro Val Tyr Gin Glu Gin He Met Gin He Ala Ser 
740 745 750 

Ala Val Ala Gly Tyr Ser Leu Gly Glu Ala Asp Leu Leu Arg Arg Ser 
755 760 765 

Met Gly Lys Lys Lys Val Glu Glu Met Lys Ser His Arg Glu Arg Phe 
770 775 780 
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Val Gin Gly Ala Lys Glu Arg Gly Val Pro Glu Glu Glu Ala Asn Arg 
785 790 795 800 

Leu Phe Asp Met Leu Glu Ala Phe Ala Asn Tyr Gly Phe Asn Lys Ser 
805 810 815 

His Ala Ala Ala Tyr Ser Leu Leu Ser Tyr Gin Thr Ala Tyr Val Lys 
820 825 830 

Ala His Tyr Pro Val Glu Phe Met Ala Ala Leu Leu Ser Val Glu Arg 
835 840 845 

His Asp Ser Asp Lys Val Ala Glu Tyr He Arg Asp Ala Arg Ala Met 
850 855 860 

Gly He Glu Val Leu Pro Pro Asp Val Asn Arg Ser Gly Phe Asp Phe 
865 870 875 880 

Leu Val Gin Gly Arg Gin He Leu Phe Gly Leu Ser Ala Val Lys Asn 
885 890 895 

Val Gly Glu Ala Ala Ala Glu Ala He Leu Arg Glu Arg Glu Arg Gly 
900 905 910 

Gly Pro Tyr Arg Ser Leu Gly Asp Phe Leu Lys Arg Leu Asp Glu Lys 
915 920 925 

Val Leu Asn Lys Arg Thr Leu Glu Ser Leu He Lys Ala Gly Ala Leu 
930 935 940 

Asp Gly Phe Gly Glu Arg Ala Arg Leu Leu Ala Ser Leu Glu Gly Leu 
945 950 955 960 

Leu Lys Trp Ala Ala Glu Asn Arg Glu Lys Ala Arg Ser Gly Met Met 
965 970 975 

Gly Leu Phe Ser Glu Val Glu Glu Pro Pro Leu Ala Glu Ala Ala Pro 
980 985 990 

Leu Asp Glu He Thr Arg Leu Arg Tyr Glu Lys Glu Ala Leu Gly He 
995 1000 1005 

Tyr Val Ser Gly His Pro He Leu Arg Tyr Pro Gly Leu Arg Glu Thr 
1010 1015 1020 

Ala Thr Cys Thr Leu Glu Glu Leu Pro His Leu Ala Arg Asp Leu Pro 
1025 1030 1035 1040 
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Pro Arg Ser Arg Val Leu Leu Ala Gly Met Val Glu Glu Val Val Arg 
1045 1050 1055 



Lys Pro Thr Lys Ser Gly Gly Met Met Ala Arg Phe Val Leu Ser Asp 
1060 1065 1070 

Glu Thr Gly Ala Leu Glu Ala Val Ala Phe Gly Arg Ala Tyr Asp Gin 
1075 1080 1085 

Val Ser Pro Arg Leu Lys Glu Asp Thr Pro Val Leu Val Leu Ala Glu 
1090 1095 1100 

Val Glu Arg Glu Glu Gly Gly Val Arg Val Leu Ala Gin Ala Val Trp 
1105 1110 1H5 1120 

Thr Tyr Gin Glu Leu Glu Gin Val Pro Arg Ala Leu Glu Val Glu Val 
1125 1130 1135 

Glu Ala Ser Leu Pro Asp Asp Arg Gly Val Ala His Leu Lys Ser Leu 
1140 1145 1150 

Leu Asp Glu His Ala Gly Thr Leu Pro Leu Tyr Val Arg Val Gin Gly 
1155 1160 1165 

Ala Phe Gly Glu Ala Leu Leu Ala Leu Arg Glu Val Arg Val Gly Glu 
1170 1175 1180 

Glu Ala Leu Gly Ala Leu Glu Ala Ala Gly Phe Pro Ala Tyr Leu Leu 
1185 H90 1195 1200 

Pro Asn Arg Glu Val Ser Pro Arg Leu Thr Gly Ser Gly Gly Pro Arg 
1205 1210 1215 

Gly Arg Ala Leu Ser Thr Gly Leu Ala Leu Lys Thr Tyr Pro lie Ala 
1220 1225 1230 

Leu Pro Gly Gly Asn Glu Ala Leu Ala Arg Pro Leu Leu 
1235 1240 1245 



<210> 88 
<211> 198 
<212> PRT 

<213> Thermus thermophilus 



<400> 88 

Val Glu Arg Val Val Arg Thr Leu Leu Asp Gly Arg Phe Leu Leu Glu 
15 10 15 



Glu Gly Val Gly Leu Trp Glu Trp Arg Tyr Pro Phe Pro Leu Glu Gly 
20 25 30 



Glu Ala Val Val Val Leu Asp Leu Glu Thr Thr Gly Leu Ala Gly Leu 
35 40 45 

Asp Glu Val He Glu Val Gly Leu Leu Arg Leu Glu Gly Gly Arg Arg 



Leu 



Pro Phe Gin Ser Leu Val Arg Pro Leu Pro Pro Ala Glu Ala Arg 



65 70 
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Ser Trp Asn Leu Thr Gly He Pro Arg Glu Ala Leu Glu Glu Ala Pro 
85 90 95 

Ser Leu Glu Glu Val Leu Glu Lys Ala Tyr Pro Leu Arg Gly Asp Ala 
100 105 HO 

Thr Leu Val He His Asn Ala Ala Phe Asp Leu Gly Phe Leu Arg Pro 
115 120 125 

Ala Leu Glu Gly Leu Gly Tyr Arg Leu Glu Asn Pro Val Val Asp Ser 
130 135 140 

Leu Arg Leu Ala Arg Arg Gly Leu Pro Gly Leu Arg Arg Tyr Gly Leu 
145 150 155 160 

Asp Ala Leu Ser Glu Val Leu Glu Leu Pro Arg Arg Thr Cys His Arg 
165 170 175 

Ala Leu Glu Asp Val Glu Arg Thr Leu Ala Val Val His Glu Val Tyr 
180 185 190 

Tyr Met Leu Thr Ser Gly 
195 



<210> 89 
<211> 182 
<212> PRT 

<213> Deinococcus radiodurans 
<400> 89 

Pro Trp Pro Gin Asp Val Val Val Phe Asp Leu Glu Thr Thr Gly Phe 



Ser Pro Ala Ser Ala Ala lie Val Glu He Gly Ala Val Arg He Val 
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Gly Gly Gin lie Asp Glu Thr Leu Lys Phe Glu Thr Leu Val Arg Pro 

35 40 45 

Thr Arg Pro Asp Gly Ser Met Leu Ser lie Pro Trp Gin Ala Gin Arg 

50 55 60 

Val His Gly lie Ser Asp Glu Met Val Arg Arg Ala Pro Ala Xaa Lys 

65 70 75 80 

Asp Val Leu Pro Asp Phe Phe Asp Phe Val Asp Gly Ser Ala Val Val 



Ala His Asn Val Ser Phe Asp Gly Gly Phe Met Arg Ala Gly Ala Glu 
100 105 HO 

Arg Leu Gly Leu Ser Trp Ala Pro Glu Arg Glu Leu Cys Thr Met Gin 
115 120 125 

Leu Ser Arg Arg Ala Phe Pro Arg Glu Arg Thr His Asn Leu Thr Val 
130 135 140 

Leu Ala Glu Arg Leu Gly Leu Glu Phe Ala Pro Gly Gly Arg His Arg 
145 150 155 160 

Ser Tyr Gly Asp Val Gin Val Thr Ala Gin Ala Tyr Leu Arg Leu Leu 
165 170 175 



Glu Leu Leu Gly Glu Arg 
180 



<210> 90 
<211> 201 
<212> PRT 

<213> Bacillus subtilis 
<400> 90 

His Gly lie Lys Met lie Tyr Gly Met Glu Ala Asn Leu Val Asp Asp 
15 10 15 

Gly Val Pro He Ala Tyr Asn Ala Ala His Arg Leu Leu Glu Glu Glu 
20 25 30 

Thr Tyr Val Val Phe Asp Val Glu Thr Thr Gly Leu Ser Ala Val Tyr 
35 40 45 
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Asp Thr lie He Glu Leu Ala Ala Val Lys Val Lys Gly Gly Glu He 
50 55 60 



He Asp Lys Phe Glu Ala Phe Ala 
65 70 

Thr He He Glu Leu Thr Gly He 
85 

Pro Asp Val Val Asp Val He Arg 
100 

Asp He Leu Val Ala His Asn Ala 
115 120 

Val Ala Tyr Lys Lys Leu Leu Glu 
130 135 

He Asp Thr Leu Glu Leu Gly Arg 
145 150 

His Arg Leu Asn Thr Leu Cys Lys 
165 

His His Arg Ala He Tyr Asp Thr 
180 

Lys Met Leu Lys Asp Ala Ala Glu 
195 200 



Asn Pro His Arg Pro Leu Ser Ala 
75 80 

Thr Asp Asp Met Leu Gin Asp Ala 
90 95 

Asp Phe Arg Glu Trp He Gly Asp 
105 HO 

Ser Phe Asp Met Gly Phe Leu Asn 
125 

Val Glu Lys Ala Lys Asn Pro Val 
140 

Phe Leu Tyr Pro Glu Phe Lys Asn 
155 160 

Lys Phe Asp He Glu Leu Thr Gin 
170 175 

Glu Ala Thr Ala Tyr Leu Leu Leu 
185 190 

Lys 



<210> 91 
<211> 188 
<212> PRT 

<213> Haemophilus influenzae 
<400> 91 

Met He Asn Pro Asn Arg Gin He Val Leu Asp Thr Glu Thr Thr Gly 
! 5 10 15 

Met Asn Gin Leu Gly Ala His Tyr Glu Gly His Cys He He Glu He 
20 25 30 

Gly Ala Val Glu Leu He Asn Arg Arg Tyr Thr Gly Asn Asn Xaa His 
35 40 45 

He Tyr He Lys Pro Asp Arg Pro Xaa Asp Pro Asp Ala He Lys Val 
50 55 60 

47 



His Gly lie Thr Asp Glu Met Leu Ala Asp Lys Pro Glu Phe Lys Glu 
65 70 75 80 



Val Ala Gin Asp Phe Leu Asp Tyr 
85 

His Asn Ala Pro Phe Asp Val Gly 
100 

Leu Asn Leu Asn Val Lys Thr Asp 
115 120 

Leu Gin Met Ala Arg Gin Met Tyr 
130 135 

Ala Leu Cys Asp Arg Leu Gly lie 
145 150 

Gly Ala Leu Leu Asp Ala Glu lie 
165 

Thr Gly Gly Gin Thr Asn Leu Phe 
180 



lie Asn Gly Ala Glu Leu Leu lie 
90 95 

Phe Met Asp Tyr Glu Phe Arg Lys 
105 HO 

Asp lie Cys Leu Val Thr Asp Thr 
125 

Pro Gly Lys Arg Asn Asn Leu Asp 
140 

Asp Asn Ser Lys Arg Thr Leu His 
155 160 

Leu Ala Asp Val Tyr Leu Met Met 
170 175 

Asp Glu Glu Glu 
185 



<210> 92 
<211> 189 
<212> PRT 

<213> Escherichia coli 
<400> 92 

Met Ser Thr Ala lie Thr Arg Gin lie Val Leu Asp Thr Glu Thr Thr 
15 10 15 

Gly Met Asn Gin He Gly Ala His Ser Glu Gly His Lys He He Glu 
20 25 30 

lie Gly Ala Val Glu Val Val Asn Arg Arg Leu Thr Gly Asn Asn Phe 
35 40 45 

His Val Tyr Leu Lys Asp Arg Leu Val Asp Pro Glu Ala Phe Gly Val 
50 55 60 

His Gly He Ala Val Asp Phe Leu Leu Asp Lys Pro Thr Phe Ala Glu 
65 70 75 80 

Val Ala Val Glu Phe Met Asp Tyr He Arg Gly Ala Glu Leu Val He 
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His Asn Ala Ala 
100 

Leu Lys Arg Asp 
115 

Ser Leu Ala Val 
130 

Asp Ala Leu Cys 
145 

His Gly Ala Leu 



Met Thr Gly Gly 
180 



85 

Phe Asp lie Gly 



He Ala Lys Thr 
120 

Ala Arg Lys Met 
135 

Ala Arg Tyr Glu 
150 

Leu Asp Ala Gin 
165 

Gin Thr Ser Met 



90 

Phe Met Asp Tyr 
105 

Asn Thr Phe Cys 



Phe Pro Gly Lys 
140 

lie Asp Asn Ser 
155 

He Leu Ala Glu 
170 

Ala Phe Ala Met 
185 



95 

Glu Phe Ser Leu 
110 

Lys Val Thr Asp 
125 

Arg Asn Ser Leu 



Lys Arg Thr Leu 
160 

Val Tyr Leu Ala 

17 5 

Glu 



<210> 93 
<211> 201 
<212> PRT 

<213> Helicobacter pylori 



<400> 93 

Asn Leu Glu Tyr Leu Lys Ala Cys Gly Leu Asn Phe He Glu Thr Ser 
15 10 15 

Glu Asn Leu He Thr Leu Lys Asn Leu Lys Thr Pro Leu Lys Asp Glu 



Val Phe Ser Phe He Asp Leu Glu Thr Thr Gly Ser Cys Pro He Lys 
35 40 45 



His Glu He Leu Glu He Gly Ala 
50 55 

He Asn Arg Phe Glu Thr Leu Val 
65 70 

He Ala Glu Leu Thr Gly He Thr 
85 

Ser Ala His Glu Ala Leu Gin Glu 
100 



Val Gin Val Lys Gly Gly Glu He 
60 

Lys Val Lys Ser Val Pro Asp Tyr 

75 80 

Tyr Glu Asp Thr Leu Asn Ala Pro 
90 95 

Leu Arg Leu Phe Leu Gly Asn Ser 
105 HO 
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Val Phe Val Ala His Asn Ala Asn Phe Asp Tyr Asn Phe Leu Gly Arg 
115 120 125 



Tyr Phe Val Glu Lys Leu His Cys 
130 135 

Thr Leu Asp Leu Ser Lys Arg Ala 
145 150 

Ser Phe Leu Lys Glu Leu Leu Gly 
165 

Ala Tyr Ala Asp Ala Leu Ala Ser 
180 

Leu Asn Leu Pro Ser Tyr He Lys 
195 200 



Pro Leu Leu Asn Leu Lys Leu Cys 
140 

He Leu Ser Met Arg Tyr Ser Leu 
155 160 

Phe Gly He Glu Val Ser His Arg 
170 175 

Tyr Lys Leu Phe Glu He Cys Leu 
185 190 

Thr 



<210> 94 
<211> 630 
<212> DNA 

<213> Thermus thermophilus 
<400> 94 

atggtggagc gggtggtgcg gacccttctg 
gggctttggg agtggcgcta cccctttccc 
ctggagacca cggggcttgc cggcctggac 
gaggggggga ggcgcctccc cttccagagc 
cgttcgtgga acctcaccgg catcccccgg 
gaggttctgg agaaggccta ccccctccgc 
gcctttgacc tgggcttcct ccgcccggcc 
cccgtggtgg actccctgcg cttggccaga 
ctggacgccc tctccgaggt cctggagctt 
gacgtggagc gcaccctcgc cgtggtgcac 
ccccgcacgc tttgggaact cgggaggtag 



gacgggaggt tcctcctgga ggagggggtg 6 0 
ctggaggggg aggcggtggt ggtcctggac 120 
gaggtgattg aggtgggcct cctccgcctg 180 
ctcgtccggc ccctcccgcc cgccgaagcc 240 
gaggccctgg aggaggcccc ctccctggag 300 
ggcgacgcca ccttggtgat ccacaacgcc 3 60 
ttggagggcc tgggctaccg cctggaaaac 42 0 
cggggcttac caggccttag gcgctacggc 4 80 
ccccgaagga cctgccaccg ggccctcgag 54 0 
gaggtatact atatgcttac gtccggccgt 60 0 
630 



<210> 95 
<211> 210 
<212> PRT 

<213> Thermus thermophilus 
<400> 95 

Met Val Glu Arg Val Val Arg Thr Leu Leu Asp Gly Arg Phe Leu Leu 



Glu Glu Gly Val Gly Leu Trp Glu Trp Arg Tyr Pro Phe Pro Leu Glu 



50 



Gly Glu Ala Val Val Val Leu Asp Leu Glu Thr Thr Gly Leu Ala Gly 

35 40 45 

Leu Asp Glu Val lie Glu Val Gly Leu Leu Arg Leu Glu Gly Gly Arg 

50 55 60 

Arg Leu Pro Phe Gin Ser Leu Val Arg Pro Leu Pro Pro Ala Glu Ala 

65 70 75 80 

Arg Ser Trp Asn Leu Thr Gly lie Pro Arg Glu Ala Leu Glu Glu Ala 



Pro Ser Leu Glu Glu Val Leu Glu Lys Ala Tyr Pro Leu Arg Gly Asp 
100 105 HO 

Ala Thr Leu Val lie His Asn Ala Ala Phe Asp Leu Gly Phe Leu Arg 
115 120 125 

Pro Ala Leu Glu Gly Leu Gly Tyr Arg Leu Glu Asn Pro Val Val Asp 
130 135 140 

Ser Leu Arg Leu Ala Arg Arg Gly Leu Pro Gly Leu Arg Arg Tyr Gly 
145 150 155 160 

Leu Asp Ala Leu Ser Glu Val Leu Glu Leu Pro Arg Arg Thr Cys His 
165 170 175 

Arg Ala Leu Glu Asp Val Glu Arg Thr Leu Ala Val Val His Glu Val 
180 185 190 

Tyr Tyr Met Leu Thr Ser Gly Arg Pro Arg Thr Leu Trp Glu Leu Gly 
195 200 205 

Arg Glx 
210 



<210> 96 
<211> 461 
<212> PRT 

<213> Pseudomonas marcesans 
<400> 96 

Met Leu Glu Ala Ser Trp Glu Lys Val Gin Ser Ser Leu Lys Gin Asn 
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Leu Ser Lys Pro Ser Tyr Glu Thr Trp He Arg Pro Thr Glu Phe Ser 
20 25 30 

Gly Phe Lys Asn Gly Glu Leu Thr Leu He Ala Pro Asn Ser Phe Ser 
35 40 45 

Ser Ala Trp Leu Lys Asn Asn Tyr Ser Gin Thr He Gin Glu Thr Ala 
50 55 60 

Glu Glu He Phe Gly Glu Pro Val Thr Val His Val Lys Val Lys Ala 
65 70 75 80 

Asn Ala Glu Ser Ser Asp Glu His Tyr Ser Ser Ala Pro He Thr Pro 



Pro Leu Glu Ala Ser Pro Gly Ser Val Asp Ser Ser Gly Ser Ser Leu 
100 105 HO 

Arg Leu Ser Lys Lys Thr Leu Pro Leu Leu Asn Leu Arg Tyr Val Phe 
115 120 125 

Asn Arg Phe Val Val Gly Pro Asn Ser Arg Met Ala His Ala Ala Ala 
130 135 140 

Met Ala Val Ala Glu Ser Pro Gly Arg Glu Phe Asn Pro Leu Phe He 
145 150 155 160 

Cys Gly Gly Val Gly Leu Gly Lys Thr His Leu Met Gin Ala He Gly 
165 170 175 

His Tyr Arg Leu Glu He Asp Pro Gly Ala Lys Val Ser Tyr Val Ser 
180 185 190 

Thr Glu Thr Phe Thr Asn Asp Leu He Leu Ala He Arg Gin Asp Arg 
195 200 205 

Met Gin Ala Phe Arg Asp Arg Tyr Arg Ala Ala Asp Leu He Leu Val 
210 215 220 

Asp Asp He Gin Phe He Glu Gly Lys Glu Tyr Thr Gin Glu Glu Phe 
225 230 235 240 

Phe His Thr Phe Asn Ala Leu His Asp Ala Gly Ser Gin He Val Leu 
245 250 255 

Ala Ser Asp Arg Pro Pro Ser Gin He Pro Arg Leu Gin Glu Arg Leu 
260 265 270 
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Met Ser Arg Phe Ser Met Gly Leu lie Ala Asp Val Gin Ala Pro Asp 
275 280 285 



Leu Glu Thr Arg Met Ala lie Leu Gin Lys Lys Ala Glu His Glu Arg 
290 295 300 

Val Gly Leu Pro Arg Asp Leu He Gin Phe He Ala Gly Arg Phe Thr 
305 310 315 320 

Ser Asn He Arg Glu Leu Glu Gly Ala Leu Thr Arg Ala lie Ala Phe 
325 330 335 

Ala Ser He Thr Gly Leu Pro Met Thr Val Asp Ser He Ala Pro Met 
340 345 350 

Leu Asp Pro Asn Gly Gin Gly Val Glu Val Thr Pro Lys Gin Val Leu 
355 360 365 

Asp Lys Val Ala Glu Val Phe Lys Val Thr Pro Asp Glu Met Arg Ser 
370 375 380 

Ala Ser Arg Arg Arg Pro Val Ser Gin Ala Arg Gin Val Gly Met Tyr 
385 390 395 400 

Leu Met Arg Gin Gly Thr Asn Leu Ser Leu Pro Arg He Gly Asp Thr 
405 410 415 

Phe Gly Gly Lys Asp His Thr Thr Val Met Tyr Ala He Glu Gin Val 
420 425 430 

Glu Lys Lys Leu Ser Ser Asp Pro Gin He Ala Ser Gin Val Gin Lys 
435 440 445 

He Arg Asp Leu Leu Gin He Asp Ser Arg Arg Lys Arg 
450 455 460 



<210> 97 
<211> 447 
<212> PRT 

<213> Synechocystis sp. 
<400> 97 

Met Val Ser Cys Glu Asn Leu Trp 
1 5 

Thr Gin Leu Thr Lys Pro Ala Phe 
20 



Gin Gin Ala Leu Ala He Leu Ala 
10 15 

Asp Thr Trp He Lys Ala Ser Val 
25 30 
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Leu lie Ser Leu Gly Asp Gly Val Ala Thr He Gin Val Glu Asn Gly 
35 40 45 

Phe Val Leu Asn His Leu Gin Lys Ser Tyr Gly Pro Leu Leu Met Glu 
50 55 60 

Val Leu Thr Asp Leu Thr Gly Gin Glu He Thr Val Lys Leu He Thr 



Asp Gly Leu Glu Pro His Ser Leu lie Gly Gin Glu Ser Ser Leu Pro 
85 90 95 

Met Glu Thr Thr Pro Lys Asn Ala Thr Ala Leu Asn Gly Lys Tyr Thr 
100 105 HO 

Phe Ser Arg Phe Val Val Gly Pro Thr Asn Arg Met Ala His Ala Ala 
115 120 125 

Ser Leu Ala Val Ala Glu Ser Pro Gly Arg Glu Phe Asn Pro Leu Phe 
130 135 140 

Leu Cys Gly Gly Val Gly Leu Gly Lys Thr His Leu Met Gin Ala He 
145 150 155 160 

Ala His Tyr Arg Leu Glu Met Tyr Pro Asn Ala Lys Val Tyr Tyr Val 
165 170 175 

Ser Thr Glu Arg Phe Thr Asn Asp Leu He Thr Ala He Arg Gin Asp 
180 185 190 

Asn Met Glu Asp Phe Arg Ser Tyr Tyr Arg Ser Ala Asp Phe Leu Leu 
195 200 205 

He Asp Asp He Gin Phe He Lys Gly Lys Glu Tyr Thr Gin Glu Glu 
210 215 220 

Phe Phe His Thr Phe Asn Ser Leu His Glu Ala Gly Lys Gin Val Val 
225 230 235 240 

Val Ala Ser Asp Arg Ala Pro Gin Arg He Pro Gly Leu Gin Asp Arg 
245 250 255 

Leu He Ser Arg Phe Ser Met Gly Leu He Ala Asp He Gin Val Pro 
260 265 270 

Asp Leu Glu Thr Arg Met Ala He Leu Gin Lys Lys Ala Glu Tyr Asp 
275 280 285 



Arg lie Arg Leu 
290 

Thr Ser Asn lie 
305 

Tyr Thr Ser Leu 



Val Leu Asn Pro 
340 

lie Thr lie Val 
355 

Ser Asn Ser Arg 
370 

Tyr Leu Met Arg 
385 

Ala Phe Gly Gly 



lie Thr Gin Leu 
420 

Ser Leu Ser His 
435 



Pro Lys Glu Val 
295 

Arg Glu Leu Glu 
310 

Ser Asn Val Ala 
325 

Pro Val Glu Lys 



Ala Gin His Tyr 
360 

Arg Arg Glu Val 
375 

Gin His Thr Asp 
390 

Lys Asp His Thr 
405 

Gin Gin Lys Asp 



Arg lie Asn lie 
440 



lie Glu Tyr lie 
300 

Gly Ala Leu lie 
315 

Met Thr Val Glu 
330 

Val Ala Ala Ala 
345 

Gin Leu Lys Val 



Ser Leu Ala Arg 
380 

Leu Ser Leu Pro 
395 

Thr Val Met Tyr 
410 

Trp Glu Thr Ser 
425 

Ala Gly Gin Ala 



Ala Ser His Tyr 



Arg Ala He Ala 
320 

Asn He Ala Pro 
335 

Pro Glu Thr He 
350 

Glu Glu Leu Leu 
365 

Gin Val Gly Met 



Arg He Gly Glu 
400 

Ser Cys Asp Lys 
415 

Gin Thr Leu Thr 
430 

Pro Glu Ser 
445 



<210> 98 
<211> 446 
<212> PRT 

<213> Bacillus subtilis 
<400> 98 

Met Glu Asn He Leu Asp Leu Trp Asn Gin Ala Leu Ala Gin He Glu 



Lys Lys Leu Ser Lys Pro Ser Phe Glu Thr Trp Met Lys Ser Thr Lys 
20 25 30 

Ala His Ser Leu Gin Gly Asp Thr Leu Thr He Thr Ala Pro Asn Glu 
35 40 45 

Phe Ala Arg Asp Trp Leu Glu Ser Arg Tyr Leu His Leu He Ala Asp 
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Thr lie Tyr Glu Leu Thr Gly Glu Glu Leu Ser lie Lys Phe Val lie 
65 70 75 80 

Pro Gin Asn Gin Asp Val Glu Asp Phe Met Pro Lys Pro Gin Val Lys 



Lys Ala Val Lys Glu Asp Thr Ser Asp Phe Pro Gin Asn Met Leu Asn 
100 105 HO 

Pro Lys Tyr Thr Phe Asp Thr Phe Val He Gly Ser Gly Asn Arg Phe 
115 120 125 

Ala His Ala Ala Ser Leu Ala Val Ala Glu Ala Pro Ala Lys Ala Tyr 
130 135 140 

Asn Pro Leu Phe He Tyr Gly Gly Val Gly Leu Gly Lys Thr His Leu 
145 150 155 160 

Met His Ala He Gly His Tyr Val He Asp His Asn Pro Ser Ala Lys 
165 170 175 

Val Val Tyr Leu Ser Ser Glu Lys Phe Thr Asn Glu Phe lie Asn Ser 
180 185 190 

He Arg Asp Asn Lys Ala Val Asp Phe Arg Asn Arg Tyr Arg Asn Val 
195 200 205 

Asp Val Leu Leu He Asp Asp He Gin Phe Leu Ala Gly Lys Glu Gin 
210 215 220 

Thr Gin Glu Glu Phe Phe His Thr Phe Asn Thr Leu His Glu Glu Ser 
225 230 235 240 

Lys Gin He Val He Ser Ser Asp Arg Pro Pro Lys Glu He Pro Thr 
245 250 255 

Leu Glu Asp Arg Leu Arg Ser Arg Phe Glu Trp Gly Leu He Thr Asp 
260 265 270 

He Thr Pro Pro Asp Leu Glu Thr Arg He Ala He Leu Arg Lys Lys 
275 280 285 

Ala Lys Ala Glu Gly Leu Asp He Pro Asn Glu Val Met Leu Tyr He 
290 295 300 

Ala Asn Gin He Asp Ser Asn He Arg Glu Leu Glu Gly Ala Leu He 



305 

Arg Val Val Ala 



Leu Ala Ala Glu 
340 

Val lie Thr lie 

355 

lie Lys Leu Glu 
370 

Phe Pro Arg Gin 
385 

Ser Leu Pro Lys 



Val He His Ala 
420 

Gin Leu Gin Gin 
435 



310 

Tyr Ser Ser Leu 
325 

Ala Leu Lys Asp 



Lys Glu He Gin 

360 

Asp Phe Lys Ala 
375 

He Ala Met Tyr 
390 

He Gly Glu Glu 
405 

His Glu Lys He 



His Val Lys Glu 
440 



315 

He Asn Lys Asp 
330 

He He Pro Ser 
345 

Arg Val Val Gly 



Lys Lys Arg Thr 
380 

Leu Ser Arg Glu 
395 

Phe Gly Gly Arg 
410 

Ser Lys Leu Leu 
425 

He Lys Glu Gin 



320 

He Asn Ala Asp 
335 

Ser Lys Pro Lys 
350 

Gin Gin Phe Asn 
365 

Lys Ser Val Ala 



Met Thr Asp Ser 
400 

Asp His Thr Thr 
415 

Ala Asp Asp Glu 
430 

Leu Lys 
445 



<210> 99 
<211> 507 
<212> PRT 

<213> Mycobacterium tuberculosis 
<400> 99 

Met Thr Asp Asp Pro Gly Ser Gly Phe Thr Thr Val Trp Asn Ala Val 
15 10 15 

Val Ser Glu Leu Asn Gly Asp Pro Lys Val Asp Asp Gly Pro Ser Ser 
20 25 30 

Asp Ala Asn Leu Ser Ala Pro Leu Thr Pro Gin Gin Arg Ala Trp Leu 
35 40 45 

Asn Leu Val Gin Pro Leu Thr He Val Glu Gly Phe Ala Leu Leu Ser 
50 55 60 

Val Pro Ser Ser Phe Val Gin Asn Glu He Glu Arg His Leu Arg Ala 
65 70 75 80 
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Pro lie Thr Asp Ala Leu Ser Arg Arg Leu Gly His Gin lie Gin Leu 
85 90 95 



Gly Val Arg lie Ala Pro Pro Ala Thr Asp Glu Ala Asp Asp Thr Thr 
100 105 HO 

Val Pro Pro Ser Glu Asn Pro Ala Thr Thr Ser Pro Asp Thr Thr Thr 
115 120 125 

Asp Asn Asp Glu lie Asp Asp Ser Ala Ala Ala Arg Gly Asp Asn Gin 
130 135 140 

His Ser Trp Pro Ser Tyr Phe Thr Glu Arg Pro His Asn Thr Asp Ser 
145 150 155 160 

Ala Thr Ala Gly Val Thr Ser Leu Asn Arg Arg Tyr Thr Phe Asp Thr 
165 170 175 

Phe Val lie Gly Ala Ser Asn Arg Phe Ala His Ala Ala Ala Leu Ala 
180 185 190 

He Ala Glu Ala Pro Ala Arg Ala Tyr Asn Pro Leu Phe He Trp Gly 
195 200 205 

Glu Ser Gly Leu Gly Lys Thr His Leu Leu His Ala Ala Gly Asn Tyr 
210 215 220 

Ala Gin Arg Leu Phe Pro Gly Met Arg Val Lys Tyr Val Ser Thr Glu 
225 230 235 240 

Glu Phe Thr Asn Asp Phe He Asn Ser Leu Arg Asp Asp Arg Lys Val 
245 250 255 

Ala Phe Lys Arg Ser Tyr Arg Asp Val Asp Val Leu Leu Val Asp Asp 
260 265 270 

He Gin Phe He Glu Gly Lys Glu Gly He Gin Glu Glu Phe Phe His 
275 280 285 

Thr Phe Asn Thr Leu His Asn Ala Asn Lys Gin He Val He Ser Ser 
290 295 300 

Asp Arg Pro Pro Lys Gin Leu Ala Thr Leu Glu Asp Arg Leu Arg Thr 
305 310 315 320 

Arg Phe Glu Trp Gly Leu He Thr Asp Val Gin Pro Pro Glu Leu Glu 
325 330 335 
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Thr Arg lie Ala lie Leu Arg Lys Lys Ala Gin Met Glu Arg Leu Ala 
340 345 350 



Val Pro Asp Asp 
355 

He Arg Glu Leu 
370 

Leu Asn Lys Thr 
385 

Asp Leu He Ala 



Met Ala Ala Thr 
420 

Gly Pro Gly Lys 
435 

Tyr Leu Cys Arg 
450 

Ala Phe Gly Arg 
465 

Leu Ser Glu Met 



Leu Thr Thr Arg 
500 



Val Leu Glu Leu 
360 

Glu Gly Ala Leu 
375 

Pro He Asp Lys 
390 

Asp Ala Asn Thr 
405 

Ala Glu Tyr Phe 



Thr Arg Ala Leu 
440 

Glu Leu Thr Asp 
455 

Asp His Thr Thr 
470 

Ala Glu Arg Arg 
485 

lie Arg Gin Arg 



He Ala Ser Ser 



lie Arg Val Thr 
380 

Ala Leu Ala Glu 
395 

Met Gin He Ser 
410 

Asp Thr Thr Val 
425 

Ala Gin Ser Arg 



Leu Ser Leu Pro 
460 

Val Met Tyr Ala 
475 

Glu Val Phe Asp 
490 

Ser Lys Arg 
505 



He Glu Arg Asn 
365 

Ala Phe Ala Ser 



He Val Leu Arg 
400 

Ala Ala Thr He 
415 

Glu Glu Leu Arg 
430 

Gin He Ala Met 
445 

Lys He Gly Gin 



Gin Arg Lys He 
480 

His Val Lys Glu 
495 



<210> 100 
<211> 446 
<212> PRT 

<213> Thermus thermophilus 
<400> 100 

Met Ser His Glu Ala Val Trp Gin His Val Leu Glu His He Arg Arg 
i 5 10 15 



le Thr Glu Val Glu Phe His Thr Trp Phe Glu Arg He Arg Pro 



Ser I 



Leu Gly He Arg Asp Gly Val Leu Glu Leu Ala Val Pro Thr Ser Phe 
35 40 45 
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Ala Leu Asp Trp lie Arg Arg His Tyr Ala Gly Leu lie Gin Glu Gly 
50 55 60 



Pro Arg Leu Leu Gly Ala Gin Ala Pro Arg Phe Glu Leu Arg Val Val 



Pro Gly Val Val Val Gin Glu Asp lie Phe Gin Pro Pro Pro Ser Pro 
85 90 95 

Pro Ala Gin Ala Gin Pro Glu Asp Thr Phe Lys Thr Ser Trp Trp Gly 
100 105 HO 

Pro Thr Thr Pro Trp Pro His Gly Gly Ala Val Ala Val Ala Glu Ser 
115 120 125 

Pro Gly Arg Ala Tyr Asn Pro Leu Phe lie Tyr Gly Gly Arg Gly Leu 
130 135 140 

Gly Lys Thr Tyr Leu Met His Ala Val Gly Pro Leu Arg Ala Lys Arg 
145 150 155 160 

Phe Pro His Met Arg Leu Glu Tyr Val Ser Thr Glu Thr Phe Thr Asn 
165 170 175 

Glu Leu He Asn Arg Pro Ser Ala Arg Asp Arg Met Thr Glu Phe Arg 
180 135 190 

Glu Arg Tyr Arg Ser Val Asp Leu Leu Leu Val Asp Asp Val Gin Phe 
195 200 205 

He Ala Gly Lys Glu Arg Thr Gin Glu Glu Phe Phe His Thr Phe Asn 
210 215 220 

Ala Leu Tyr Glu Ala His Lys Gin lie He Leu Ser Ser Asp Arg Pro 
225 230 235 240 

Pro Lys Asp He Leu Thr Leu Glu Ala Arg Leu Arg Ser Arg Phe Glu 
245 250 255 

Trp Gly Leu He Thr Asp Asn Pro Ala Pro Asp Leu Glu Thr Arg He 
260 265 270 

Ala He Leu Lys Met Asn Ala Ser Ser Gly Pro Glu Asp Pro Glu Asp 
275 280 285 

Ala Leu Glu Tyr He Ala Arg Gin Val Thr Ser Asn He Arg Glu Trp 
290 295 300 
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Glu Gly Ala Leu 
305 

Glu Leu Thr Arg 



Arg Glu Leu Glu 
340 

Pro Val Arg Pro 
355 

Lys Glu Val Val 
370 

Leu Thr Pro Ala 
385 

Asp His Thr Thr 



Gly Lys Pro Asp 
420 

Ala Cys Thr Asp 
435 



Met Arg Ala Ser 
310 

Ala Val Ala Ala 
325 

Ala Asp Pro Leu 



Glu Thr Pro Gly 
360 

Leu Pro Arg Gin 
375 

Ser Leu Pro Glu 
390 

Val Arg Tyr Ala 
405 

Arg Glu Val Gin 



Pro Val Asp Asn 
440 



Pro Phe Ala Ser 
315 

Lys Ala Leu Arg 
330 

Glu lie He Arg 
345 

Gly Ala His Gly 



Leu Ala Met Tyr 
380 

He Gly Gin Leu 
395 

He Gin Lys Val 
410 

Gly Leu Leu Arg 
425 

Leu Trp He Thr 



Leu Asn Gly Val 
320 

His Leu Arg Pro 
335 

Lys Ala Ala Gly 
350 

Glu Arg Arg Lys 
365 

Leu Val Arg Glu 



Phe Gly Gly Arg 
400 

Gin Glu Leu Ala 
415 

Thr Leu Arg Glu 
430 

Cys Gly 
445 



<210> 101 
<211> 467 
<212> PRT 

<213> Escherichia coli 
<400> 101 

Met Ser Leu Ser Leu Trp Gin Gin Cys Leu Ala Arg Leu Gin Asp Glu 
1 5 10 15 

Leu Pro Ala Thr Glu Phe Ser Met Trp He Arg Pro Leu Gin Ala Glu 
20 25 30 

Leu Ser Asp Asn Thr Leu Ala Leu Tyr Ala Pro Asn Arg Phe Val Leu 
35 40 45 

Asp Trp Val Arg Asp Lys Tyr Leu Asn Asn He Asn Gly Leu Leu Thr 
50 55 60 

Ser Phe Cys Gly Ala Asp Ala Pro Gin Leu Arg Phe Glu Val Gly Thr 
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Lys Pro Val Thr Gin Thr Pro Gin Ala Ala Val Thr Ser Asn Val Ala 



Ala Pro Ala Gin Val Ala Gin Thr Gin Pro Gin Arg Ala Ala Pro Ser 
100 105 HO 

Thr Arg Ser Gly Trp Asp Asn Val Pro Ala Pro Ala Glu Pro Thr Tyr 
115 120 125 

Arg Ser Asn Val Asn Val Lys His Thr Phe Asp Asn Phe Val Glu Gly 
130 135 140 

Lys Ser Asn Gin Leu Ala Arg Ala Ala Ala Arg Gin Val Ala Asp Asn 
145 150 155 160 

Pro Gly Gly Ala Tyr Asn Pro Leu Phe Leu Tyr Gly Gly Thr Gly Leu 
165 170 175 

Gly Lys Thr His Leu Leu His Ala Val Gly Asn Gly lie Met Ala Arg 
180 185 190 

Lys Pro Asn Ala Lys Val Val Tyr Met His Ser Glu Arg Phe Val Gin 
195 200 205 

Asp Met Val Lys Ala Leu Gin Asn Asn Ala lie Glu Glu Phe Lys Arg 
210 215 220 

Tyr Tyr Arg Ser Val Asp Ala Leu Leu lie Asp Asp lie Gin Phe Phe 
225 ' 230 235 240 

Ala Asn Lys Glu Arg Ser Gin Glu Glu Phe Phe His Thr Phe Asn Ala 
245 250 255 

Leu Leu Glu Gly Asn Gin Gin lie lie Leu Thr Ser Asp Arg Tyr Pro 
260 265 270 

Lys Glu He Asn Gly Val Glu Asp Arg Leu Lys Ser Arg Phe Gly Trp 
275 280 285 

Gly Leu Thr Val Ala He Glu Pro Pro Glu Leu Glu Thr Arg Val Ala 
290 295 300 

He Leu Met Lys Lys Ala Asp Glu Asn Asp He Arg Leu Pro Gly Glu 
305 310 315 320 

Val Ala Phe Phe He Ala Lys Arg Leu Arg Ser Asn Val Arg Glu Leu 
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325 



330 



335 



Glu Gly Ala Leu 
340 

Ala lie Thr lie 
355 

Leu Gin Glu Lys 
370 

Glu Tyr Tyr Lys 
385 

Arg Ser Val Ala 



Leu Thr Asn His 
420 

Asp His Thr Thr 
435 

Glu Glu Ser His 
450 

Leu Ser Ser 
465 



Asn Arg Val He 



Asp Phe Val Arg 
360 

Leu Val Thr He 
375 

lie Lys Val Ala 
390 

Arg Pro Arg Gin 
405 

Ser Leu Pro Glu 



Val Leu His Ala 
440 

Asp He Lys Glu 
455 



Ala Asn Ala Asn 
345 

Glu Ala Leu Arg 



Asp Asn He Gin 
380 

Asp Leu Leu Ser 
395 

Met Ala Met Ala 
410 

He Gly Asp Ala 
425 

Cys Arg Lys He 



Asp Phe Ser Asn 
460 



Phe Thr Gly Arg 
350 

Asp Leu Leu Ala 
365 

Lys Thr Val Ala 



Lys Arg Arg Ser 
400 

Leu Ala Lys Glu 
415 

Phe Gly Gly Arg 
430 

Glu Gin Leu Arg 
445 

Leu He Arg Thr 



<210> 102 
<211> 440 
<212> PRT 

<213> Thermatoga maritima 
<400> 102 

Met Lys Glu Arg He Leu Gin Glu He Lys Thr Arg Val Asn Arg Lys 
15 10 15 

Ser Trp Glu Leu Trp Phe Ser Ser Phe Asp Val Lys Ser He Glu Gly 
20 25 30 

Asn Lys Val Val Phe Ser Val Gly Asn Leu Phe He Lys Glu Trp Leu 
35 40 45 

Glu Lys Lys Tyr Tyr Ser Val Leu Ser Lys Ala Val Lys Val Val Leu 
50 55 60 



63 



Gly Asn Asp Ala Thr Phe Glu lie Thr Tyr Glu Ala Phe Glu Pro His 
65 70 75 80 



Ser Tyr Ser Glu Pro Leu Val Lys Lys Arg Ala Val Leu Leu Thr 



Pro Leu Asn Pro Asp Tyr Thr Phe Glu Asn Phe Val Val Gly Pro Gly 
100 105 HO 

Asn Ser Phe Ala Tyr His Ala Ala Leu Glu Val Ala Lys His Pro Gly 
115 120 125 

Arg Tyr Asn Pro Leu Phe lie Tyr Gly Gly Val Gly Leu Gly Lys Thr 
130 135 140 

His Leu Leu Gin Ser lie Gly Asn Tyr Val Val Gin Asn Glu Pro Asp 
145 150 155 160 

Leu Arg Val Met Tyr lie Thr Ser Glu Lys Phe Leu Asn Asp Leu Val 
165 170 175 

Asp Ser Met Lys Glu Gly Lys Leu Asn Glu Phe Arg Glu Lys Tyr Arg 
180 185 190 

Lys Lys Val Asp lie Leu Leu He Asp Asp Val Gin Phe Leu He Gly 
195 200 205 

Lys Thr Gly Val Gin Thr Glu Leu Phe His Thr Phe Asn Glu Leu His 
210 215 220 

Asp Ser Gly Lys Gin He Val He Cys Ser Asp Arg Glu Pro Gin Lys 
225 230 235 240 

Leu Ser Glu Phe Gin Asp Arg Leu Val Ser Arg Phe Gin Met Gly Leu 
245 250 255 

Val Ala Lys Leu Glu Pro Pro Asp Glu Glu Thr Arg Lys Ser He Ala 
260 265 270 

Arg Lys Met Leu Glu He Glu His Gly Glu Leu Pro Glu Glu Val Leu 
275 280 285 

Asn Phe Val Ala Glu Asn Val Asp Asp Asn Leu Arg Arg Leu Arg Gly 
290 295 300 

Ala He He Lys Leu Leu Val Tyr Lys Glu Thr Thr Gly Lys Glu Val 
305 310 315 320 
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Asp Leu Lys Glu 



Arg Val Lys Ala 
340 

Lys Val Thr Gly 
355 

Val Lys Ala Leu 
370 

Tyr Leu Lys Ser 
385 

His Pro Val Val 



Lys Gly Asn Lys 
420 

lie Ser Arg Arg 
435 



Ala lie Leu Leu 
325 

Met Asp Pro He 



Val Pro Arg Glu 
360 

Thr Ala Arg Arg 
375 

Ser Leu Arg Thr 
390 

Val Asp Ser Val 
405 

Gin Leu Lys Ala 



Ala Leu Ser Gly 
440 



Leu Lys Asp Phe 
330 

Asp Glu Leu He 
345 

Glu He Leu Ser 



He Gly Met Tyr 
380 

He Ala Glu Lys 
395 

Lys Lys Val Lys 
410 

Leu He Asp Glu 
425 



He Lys Pro Asn 
335 

Glu He Val Ala 
350 

Asn Ser Arg Asn 
365 

Val Ala Lys Asn 



Phe Asn Arg Ser 
400 

Asp Ser Leu Leu 
415 

Val He Gly Glu 
430 



<210> 103 
<211> 457 
<212> PRT 

<213> Helicobacter pylori 
<400> 103 

Met Asp Thr Asn Asn Asn lie Glu Lys Glu He Leu Ala Leu Val Lys 
1 5 10 15 

Gin Asn Pro Lys Val Ser Leu He Glu Tyr Glu Asn Tyr Phe Ser Gin 
20 25 30 

Leu Lys Tyr Asn Pro Asn Ala Ser Lys Ser Asp He Ala Phe Phe Tyr 
35 40 45 

Ala Pro Asn Gin Val Leu Cys Thr Thr He Thr Ala Lys Tyr Gly Ala 
50 55 60 

Leu Leu Lys Glu He Leu Ser Gin Asn Lys Val Gly Met His Leu Ala 
65 70 75 80 

His Ser Val Asp Val Arg He Glu Val Ala Pro Lys He Gin He Asn 
85 90 95 
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Ala Gin Ser Asn 
100 

Ser Tyr Thr Phe 
115 

Tyr Glu lie Ala 
130 

Pro Val Leu Phe 
145 

Asn Ala lie Gly 



Val Thr Ser Glu 
180 

Lys Thr Met Asp 
195 

Leu Leu Asp Asp 
210 

Glu Phe Phe His 
225 

Val Leu lie Ser 



Arg Leu Lys Ser 
260 

Pro Asp Leu Glu 
275 

Asn Gin lie Thr 
290 

lie Ser Asp Asn 
305 

Val Asn Ala Asn 



Thr Val Leu Glu 
340 



lie Asn Tyr Lys 



Glu Asn Phe Val 
120 

Lys Lys Val Ala 
135 

Tyr Gly Gly Thr 
150 

Asn His Ala Leu 
165 

Asp Phe Leu Thr 



Ser Phe Lys Ala 
200 

Ala Gin Phe Leu 
215 

Thr Phe Asn Glu 
230 

Asp Arg Ser Pro 
245 

Arg Phe Glu Trp 



Thr Lys Leu Ser 
280 

Leu Pro Glu Glu 
295 

lie Arg Gin Met 

310 

Leu Met Asn Ala 
325 

Asp Leu Gin Lys 



Ala lie Lys Thr 
105 

Val Gly Ser Cys 



Gin Ser Asp Thr 
140 

Gly Leu Gly Lys 
155 

Glu Lys His Lys 
17 0 

Asp Phe Leu Lys 
185 

Lys Tyr Arg His 



Gin Gly Lys Pro 
220 

Leu His Ala Asn 
235 

Lys Asn lie Ala 
250 

Gly lie Thr Ala 
265 

He Val Lys Gin 



Val Met Glu Tyr 
300 

Glu Gly Ala He 
315 

Ser He Asp Leu 
330 

Asp His Ala Glu 
345 



Ser Val Lys Asp 
110 

Asn Asn Thr Val 
125 

Pro Pro Tyr Asn 



Thr His lie Leu 
160 

Lys Val Val Leu 
175 

His Leu Asp Asn 
190 

Cys Asp Phe Phe 
205 

Lys Leu Glu Glu 



Ser Lys Gin lie 
240 

Gly Leu Glu Asp 
255 

Lys Val Met Pro 
270 

Lys Cys Gin Leu 
285 

lie Ala Gin His 



lie Lys lie Ser 
320 

Asn Leu Ala Lys 
335 

Gly Ser Ser Leu 
350 
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Glu Asn lie Leu 
355 

Glu He Lys Val 
370 

Leu Val Val Tyr 
385 

Leu Ala Gin Phe 



Tyr Ser Gly Val 
420 

Leu Ser Leu Arg 
435 

Lys Lys Thr Ala 
450 



Leu Ala Val Ala 
360 

Ser Ser Arg Gin 
375 

Phe Ala Arg Leu 
390 

Leu Asp Leu Lys 
405 

Lys Lys Met Leu 



Glu Glu He Lys 
440 

Phe Asn Ser Ser 
455 



Gin Ser Leu Asn 



Lys Asn Val Ala 
380 

Tyr Thr Pro Asn 
395 

Asp His Ser Ser 
410 

Glu Glu Glu Lys 
425 

Asn Arg Leu Asn 



Glu 



Leu Lys Ser Ser 
365 

Leu Ala Arg Lys 



Pro Thr Leu Ser 
400 

lie Ser Lys Met 
415 

Ser Pro Phe Val 
430 

Glu Leu Asn Asp 
445 



<210> 104 
<211> 1305 
<212> DNA 

<213> Thermus thermophilus 
<400> 104 

gtgtcgcacg aggccgtctg gcaacacgtt 
gtggagttcc acacctggtt tgaaaggatc 
gagctcgccg tgcccacctc ctttgccctg 
atccaggagg gccctcggct cctcggggcc 
cccggggtcg tagtccagga ggacatcttc 
caacccgaag atacctttaa aacttcgtgg 
ggcgccgtgg ccgtggccga gtcccccggc 
ggccgtggcc tgggaaagac ctacctgatg 
ttcccccaca tgagattaga gtacgtttcc 
cggccatccg cgagggaccg gatgacggag 
ctgctggtgg acgacgtcca gttcatcgcc 
cacaccttca acgcccttta cgaggcccac 
cccaaggaca tcctcaccct ggaggcgcgc 
accgacaatc cagcccccga cctggaaacc 
agcgggcctg aggatcccga ggacgccctg 
atccgggagt gggaaggggc cctcatgcgg 
gagctgaccc gcgccgtggc ggccaaggct 
gcggacccct tggagatcat ccgcaaagcg 
ggagctcacg gggagcgccg caagaaggag 



ctggagcaca tccgccgcag catcaccgag 6 0 
cgccccttgg ggatccggga cggggtgctg 12 0 
gactggatcc ggcgccacta cgccggcctc 180 
caggcgcccc ggtttgagct ccgggtggtg 240 
cagcccccgc cgagcccccc ggcccaagct 300 
tggggcccaa caactccatg gccccacggc 3 60 
cgggcctaca accccctctt catctacggg 420 
cacgccgtgg gcccactccg tgcgaagcgc 480 
acggaaactt tcaccaacga gctcatcaac 54 0 
ttccgggagc ggtaccgctc cgtggacctc 60 0 
ggaaaggagc gcacccagga ggagtttttc 660 
aagcagatca tcctctcctc cgaccggccg 72 0 
ctgcggagcc gctttgagtg gggcctgatc 780 
cggatcgcca tcctgaagat gaacgccagc 84 0 
gagtacatcg cccggcaggt cacctccaac 9 00 
gcatcgcctt tcgcctccct caacggcgtt 960 
ctccgacatc ttcgccccag ggagctggag 1020 
gcgggaccag ttcggcctga aaccccggga 10 80 
gtggtcctcc cccggcagct cgccatgtac 1140 
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ctggtgcggg agctcacccc ggcctccctg cccgagatcg accagctcaa cgacgaccgg 12 00 
gaccacacca cggtcctcta cgccatccag aaggtccagg agctcgcgga aagcgaccgg 12 60 
gaggtgcagg gcctcctccg caccctccgg gaggcgtgca catga 13 05 



<210> 105 
<211> 434 
<212> PRT 

<213> Thermus thermophilus 
<400> 105 

Val Ser His Glu Ala Val Trp Gin His Val Leu Glu His lie Arg Arg 
15 10 15 



Ser 



lie Thr Glu Val Glu Phe His Thr Trp Phe Glu Arg lie Arg Pro 



Leu Gly He Arg Asp Gly Val Leu Glu Leu Ala Val Pro Thr Ser Phe 
35 40 45 

Ala Leu Asp Trp He Arg Arg His Tyr Ala Gly Leu He Gin Glu Gly 
50 55 60 

Pro Arg Leu Leu Gly Ala Gin Ala Pro Arg Phe Glu Leu Arg Val Val 
65 70 75 80 

Pro Gly Val Val Val Gin Glu Asp He Phe Gin Pro Pro Pro Ser Pro 



Pro Ala Gin Ala Gin Pro Glu Asp Thr Phe Lys Thr Ser Trp Trp Gly 
100 105 HO 

Pro Thr Thr Pro Trp Pro His Gly Gly Ala Val Ala Val Ala Glu Ser 
115 120 125 

Pro Gly Arg Ala Tyr Asn Pro Leu Phe He Tyr Gly Gly Arg Gly Leu 
130 135 140 

Gly Lys Thr Tyr Leu Met His Ala Val Gly Pro Leu Arg Ala Lys Arg 
145 150 155 160 

Phe Pro His Met Arg Leu Glu Tyr Val Ser Thr Glu Thr Phe Thr Asn 
165 170 175 

Glu Leu He Asn Arg Pro Ser Ala Arg Asp Arg Met Thr Glu Phe Arg 
180 185 190 

Glu Arg Tyr Arg Ser Val Asp Leu Leu Leu Val Asp Asp Val Gin Phe 
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195 



200 



205 



He Ala Gly Lys Glu Arg Thr Gin Glu Glu Phe Phe His Thr Phe Asn 
210 215 220 

Ala Leu Tyr Glu Ala His Lys Gin He lie Leu Ser Ser Asp Arg Pro 
225 230 235 240 

Pro Lys Asp He Leu Thr Leu Glu Ala Arg Leu Arg Ser Arg Phe Glu 
245 250 255 

Trp Gly Leu He Thr Asp Asn Pro Ala Pro Asp Leu Glu Thr Arg He 
260 265 270 

Ala He Leu Lys Met Asn Ala Ser Ser Gly Pro Glu Asp Pro Glu Asp 
275 280 285 

Ala Leu Glu Tyr He Ala Arg Gin Val Thr Ser Asn He Arg Glu Trp 
290 295 300 

Glu Gly Ala Leu Met Arg Ala Ser Pro Phe Ala Ser Leu Asn Gly Val 
305 310 315 320 

Glu Leu Thr Arg Ala Val Ala Ala Lys Ala Leu Arg His Leu Arg Pro 
325 330 335 

Arg Glu Leu Glu Ala Asp Pro Leu Glu He He Arg Lys Ala Ala Gly 
340 345 350 

Pro Val Arg Pro Glu Thr Pro Gly Gly Ala His Gly Glu Arg Arg Lys 
355 360 365 

Lys Glu Val Val Leu Pro Arg Gin Leu Ala Met Tyr Leu Val Arg Glu 
370 375 380 

Leu Thr Pro Ala Ser Leu Pro Glu He Asp Gin Leu Asn Asp Asp Arg 
385 390 395 400 

Asp His Thr Thr Val Leu Tyr Ala He Gin Lys Val Gin Glu Leu Ala 
405 410 415 

Glu Ser Asp Arg Glu Val Gin Gly Leu Leu Arg Thr Leu Arg Glu Ala 
420 425 430 



Cys Thr 
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<210> 106 
<211> 1128 
<212> DNA 

<213> Thermus thermophilus 
<400> 106 

atgaacataa cggttcccaa aaaactcctc 
gtcccctcta gaagcgccaa ccccctctac 
gccttgatcc tcttcgggac caacggggag 
gcccaaagcc ttccccgggt gctcgtcccc 
cttcctgggg acctcgtggc cctcggcctc 
gagctctcct ccgggcgttt ccgcacccgg 
gagcttctgg tgcccgaggg ggaggacaag 
tccggggagc tcgtcaaggc cttgacccac 
cgggccatct tccgcggggt gcagctggag 
tccgacgggt accgcctcgc cctctacgac 
gccgtggtcc ccgcccggag cgtggacgag 
gccgaggccg tcctcgccct gggcgagggg 
ggggtccgga tggccctccg cctcatggaa 
ccccaggagt tcgccctcaa ggtccaggtg 
cgggtgagcg tcctctccga ccggcagaac 
cggatcctcc tctccgccga gggggactac 
gtggaggggc cggacatggc cgtggcctac 
cccgtggggg accgggccca cctgggcatc 
ggggacgggg aggggtaccg ggcggtggtg 



tcggaccagc tttccctcct ggagcgcatc 60 
acctacctgg ggctttacgc cgaggaaggg 12 0 
gtggacctcg aggtccgcct ccccgccgag 180 
gcccagccct tcttccagct ggtgcggagc 240 
gcctcggagc cgggccaggg ggggcagctg 3 00 
ctcagcctgg cccctgccga gggctacccc 3 60 
ggggccttcc ccctccggac gcggatgccc 420 
gtgcgctacg ccgcgagcaa cgaggagtac 4 80 
ttctcccccc agggcttccg ggcggtggcc 540 
ctgcccctgc cccaagggtt ccaggccaag 600 
atggtgcggg tcctgaaggg ggcggacggg 6 60 
gtgttggccc tggccctcga gggcggaagc 720 
ggggagttcc ccgactacca gagggtcatc 7 80 
gagggggagg ccctcaggga ggcggtgcgc 84 0 
caccgggtgg acctcctttt ggaggaaggc 90 0 
ggcaaggggc aggaggaggt gcccgcccag 960 
aacgcccgct acctcctcga ggccctcgcc 102 0 
tccgggccca cgagcccgag cctcatctgg 1080 
gtgcccctca gggtctag 112 8 



<210> 107 
<211> 376 
<212> PRT 

<213> Thermus thermophilus 



<400> 107 

Met Asn He Thr Val Pro Lys Lys 
1 5 

Leu Glu Arg He Val Pro Ser Arg 
20 

Leu Gly Leu Tyr Ala Glu Glu Gly 
35 40 

Gly Glu Val Asp Leu Glu Val Arg 
50 55 

Pro Arg Val Leu Val Pro Ala Gin 
65 70 



Leu Leu Ser Asp Gin Leu Ser Leu 
10 15 

Ser Ala Asn Pro Leu Tyr Thr Tyr 
25 30 

Ala Leu lie Leu Phe Gly Thr Asn 
45 

Leu Pro Ala Glu Ala Gin Ser Leu 
60 

Pro Phe Phe Gin Leu Val Arg Ser 
75 80 
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Leu Pro Gly Asp Leu Val Ala Leu Gly Leu Ala Ser Glu Pro Gly Gin 
85 90 95 



Gly Gly Gin Leu Glu Leu Ser Ser Gly Arg Phe Arg Thr Arg Leu Ser 
100 105 HO 

Leu Ala Pro Ala Glu Gly Tyr Pro Glu Leu Leu Val Pro Glu Gly Glu 
115 120 125 

Asp Lys Gly Ala Phe Pro Leu Arg Thr Arg Met Pro Ser Gly Glu Leu 
130 135 140 

Val Lys Ala Leu Thr His Val Arg Tyr Ala Ala Ser Asn Glu Glu Tyr 
145 150 155 160 

Arg Ala lie Phe Arg Gly Val Gin Leu Glu Phe Ser Pro Gin Gly Phe 
165 170 175 

Arg Ala Val Ala Ser Asp Gly Tyr Arg Leu Ala Leu Tyr Asp Leu Pro 
180 185 190 

Leu Pro Gin Gly Phe Gin Ala Lys Ala Val Val Pro Ala Arg Ser Val 
195 200 205 

Asp Glu Met Val Arg Val Leu Lys Gly Ala Asp Gly Ala Glu Ala Val 
210 215 220 

Leu Ala Leu Gly Glu Gly Val Leu Ala Leu Ala Leu Glu Gly Gly Ser 
225 230 235 240 

Gly Val Arg Met Ala Leu Arg Leu Met Glu Gly Glu Phe Pro Asp Tyr 
245 250 255 

Gin Arg Val lie Pro Gin Glu Phe Ala Leu Lys Val Gin Val Glu Gly 
260 265 270 



Glu Ala Leu Arg Glu Ala Val Arg 
275 280 

Gin Asn His Arg Val Asp Leu Leu 
290 295 

Ser Ala Glu Gly Asp Tyr Gly Lys 
305 310 

Val Glu Gly Pro Asp Met Ala Val 
325 



Arg Val Ser Val Leu Ser Asp Arg 
285 

Leu Glu Glu Gly Arg lie Leu Leu 
300 

Gly Gin Glu Glu Val Pro Ala Gin 
315 320 

Ala Tyr Asn Ala Arg Tyr Leu Leu 
330 335 
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Glu Ala Leu Ala Pro Val Gly Asp Arg Ala His Leu Gly He Ser Gly 
340 345 350 

Pro Thr Ser Pro Ser Leu He Trp Gly Asp Gly Glu Gly Tyr Arg Ala 
355 360 365 

Val Val Val Pro Leu Arg Val Glx 
370 375 



<210> 108 
<211> 376 
<212> PRT 

<213> Thermus thermophi lus 
<400> 108 

Met Asn He Thr Val Pro Lys Lys Leu Leu Ser Asp Gin Leu Ser Leu 



Leu Glu Arg He Val Pro Ser Arg Ser Ala Asn Pro Leu Tyr Thr Tyr 
20 25 30 

Leu Gly Leu Tyr Ala Glu Glu Gly Ala Leu He Leu Phe Gly Thr Asn 
35 40 45 

Gly Glu Val Asp Leu Glu Val Arg Leu Pro Ala Glu Ala Gin Ser Leu 
50 55 60 

Pro Arg Val Leu Val Pro Ala Gin Pro Phe Phe Gin Leu Val Arg Ser 
65 70 75 80 

Leu Pro Gly Asp Leu Val Ala Leu Gly Leu Ala Ser Glu Pro Gly Gin 



Gly Gly Gin Leu Glu Leu Ser Ser Gly Arg Phe Arg Thr Arg Leu Ser 
100 105 HO 

Leu Ala Pro Ala Glu Gly Tyr Pro Glu Leu Leu Val Pro Glu Gly Glu 
115 120 125 

Asp Lys Gly Ala Phe Pro Leu Arg Thr Arg Met Pro Ser Gly Glu Leu 
130 135 140 

Val Lys Ala Leu Thr His Val Arg Tyr Ala Ala Ser Asn Glu Glu Tyr 
145 150 155 160 

Arg Ala He Phe Arg Gly Val Gin Leu Glu Phe Ser Pro Gin Gly Phe 
165 170 175 
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Arg Ala Val Ala Ser Asp Gly Tyr Arg Leu Ala Leu Tyr Asp Leu Pro 
180 185 190 



Leu Pro Gin Gly Phe Gin Ala Lys Ala Val Val Pro Ala Arg Ser Val 
195 200 205 

Asp Glu Met Val Arg Val Leu Lys Gly Ala Asp Gly Ala Glu Ala Val 
210 215 220 

Leu Ala Leu Gly Glu Gly Val Leu Ala Leu Ala Leu Glu Gly Gly Ser 
225 230 235 240 

Gly Val Arg Met Ala Leu Arg Leu Met Glu Gly Glu Phe Pro Asp Tyr 
245 250 255 

Gin Arg Val lie Pro Gin Glu Phe Ala Leu Lys Val Gin Val Glu Gly 
260 265 270 

Glu Ala Leu Arg Glu Ala Val Arg Arg Val Ser Val Leu Ser Asp Arg 
275 280 285 

Gin Asn His Arg Val Asp Leu Leu Leu Glu Glu Gly Arg He Leu Leu 
290 295 300 

Ser Ala Glu Gly Asp Tyr Gly Lys Gly Gin Glu Glu Val Pro Ala Gin 
305 310 315 320 

Val Glu Gly Pro Asp Met Ala Val Ala Tyr Asn Ala Arg Tyr Leu Leu 
325 330 335 

Glu Ala Leu Ala Pro Val Gly Asp Arg Ala His Leu Gly He Ser Gly 
340 345 350 

Pro Thr Ser Pro Ser Leu lie Trp Gly Asp Gly Glu Gly Tyr Arg Ala 
355 360 365 

Val Val Val Pro Leu Arg Val Glx 
370 375 



<210> 109 
<211> 367 
<212> PRT 

<213> Escherichia coli 
<400> 109 

Met Lys Phe Thr Val Glu Arg Glu His Leu Leu Lys Pro Leu Gin Gin 
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1 5 10 15 

Val Ser Gly Pro Leu Gly Gly Arg Pro Thr Leu Pro lie Leu Gly Asn 
20 25 30 

Leu Leu Leu Gin Val Ala Asp Gly Thr Leu Ser Leu Xhr Gly Thr Asp 



Leu Glu Met Glu Met Val Ala Arg Val Ala Leu Val Gin Pro His Glu 
50 55 60 



Pro 



Gly Ala Thr Thr Val Pro Ala Arg Lys Phe Phe Asp lie Cys Arg 



Gly Leu Pro Glu Gly Ala Glu lie Ala Val Gin Leu Glu Gly Glu Arg 
85 90 95 

Met Leu Val Arg Ser Gly Arg Ser Arg Phe Ser Leu Ser Thr Leu Pro 
100 105 HO 

Ala Ala Asp Phe Pro Asn Leu Asp Asp Trp Gin Ser Glu Val Glu Phe 
115 120 125 

Thr Leu Pro Gin Ala Thr Met Lys Arg Leu He Glu Ala Thr Gin Phe 
130 135 140 

Ser Met Ala His Gin Asp Val Arg Tyr Tyr Leu Asn Gly Met Leu Phe 
145 150 155 ISO 

Glu Thr Glu Gly Glu Glu Leu Arg Thr Val Ala Thr Asp Gly His Arg 
165 170 175 

Leu Ala Val Cys Ser Met Pro He Gly Gin Ser Leu Pro Ser His Ser 
180 185 190 

Val He Val Pro Arg Lys Gly Val He Glu Leu Met Arg Met Leu Asp 
195 200 205 

Gly Gly Asp Asn Pro Leu Arg Val Gin He Gly Ser Asn Asn He Arg 
210 215 220 

Ala His Val Gly Asp Phe He Phe Thr Ser Lys Leu Val Asp Gly Arg 
225 230 235 240 

Phe Pro Asp Tyr Arg Arg Val Leu Pro Lys Asn Pro Asp Lys His Leu 
245 250 255 

Glu Ala Gly Cys Asp Leu Leu Lys Gin Ala Phe Ala Arg Ala Ala He 



260 



265 



270 



Leu Ser Asn Glu 
275 

Gin Leu Lys He 
290 

He Leu Asp Val 
305 

Val Ser Tyr Val 



Arg Met Met Leu 
340 

Ala Ser Gin Ser 
355 



Lys Phe Arg Gly 
280 

Thr Ala Asn Asn 
295 

Thr Tyr Ser Gly 
310 

Leu Asp Val Leu 
325 

Thr Asp Ser Val 



Ala Ala Tyr Val 
360 



Val Arg Leu Tyr 



Pro Glu Gin Glu 
300 

Ala Glu Met Glu 
315 

Asn Ala Leu Lys 
330 

Ser Ser Val Gin 
345 

Val Met Pro Met 



Val Ser Glu Asn 
285 

Glu Ala Glu Glu 



He Gly Phe Asn 
320 

Cys Glu Asn Val 
335 

He Glu Asp Ala 
350 

Arg Leu Glx 
365 



<210> 110 
<211> 367 
<212> PRT 

<213> Proteus mirabilis 



<400> 110 

Met Lys Phe He He Glu Arg Glu 
1 5 

Val Ser Gly Pro Leu Gly Gly Arg 
20 

Leu Leu Leu Lys Val Thr Glu Asn 
35 40 

Leu Glu Met Glu Met Met Ala Arg 
50 55 

He Gly Ala Thr Thr Val Pro Ala 
65 70 

Gly Leu Pro Glu Gly Ala Glu He 
85 

Leu Leu Val Arg Ser Gly Arg Ser 
100 



Gin Leu Leu Lys Pro Leu Gin Gin 
10 15 

Pro Thr Leu Pro He Leu Gly Asn 
25 30 

Thr Leu Ser Leu Thr Gly Thr Asp 
45 

Val Ser Leu Ser Gin Ser His Glu 
60 

Arg Lys Phe Phe Asp He Trp Arg 
75 80 

Ser Val Glu Leu Asp Gly Asp Arg 
90 95 

Arg Phe Ser Leu Ser Thr Leu Pro 
105 HO 
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Ala Ser Asp Phe 
115 

Thr Leu Pro Gin 
13 0 

Ser Met Ala His 
145 

Glu Thr Glu Asn 



Leu Ala Val Cys 
180 

Val lie Val Pro 
195 

Gly Ser Gly Glu 
210 

Arg Ala His Val 
225 

Arg Phe Pro Asp 



Val lie Ala Gly 
260 

lie Leu Ser Asn 
275 

Gly Gin Leu Lys 
290 

Glu lie Val Asp 
305 

Asn Val Ser Tyr 



Val Lys Leu Leu 
340 

Val Ala Ser Ala 
355 



Pro Asn Leu Asp 
120 

Ala Thr Leu Lys 
135 

Gin Asp Val Arg 
150 

Thr Glu Leu Arg 
165 

Ala Met Asp lie 



Arg Lys Gly Val 
200 

Ser Leu Leu Gin 
215 

Gly Asp Phe lie 
230 

Tyr Arg Arg Val 
245 

Cys Asp He Leu 



Glu Lys Phe Arg 
280 

He Thr Ala Asn 
295 

Val Gin Tyr Gin 
310 

Leu Leu Asp Val 
325 

Leu Thr Asp Ala 



Ala Ala Ala Tyr 
360 
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Asp Trp Gin Ser 



Arg Leu He Glu 
14 0 

Tyr Tyr Leu Asn 
155 

Thr Val Ala Thr 

170 

Gly Gin Ser Leu 
185 

He Glu Leu Met 



Leu Gin He Gly 
220 

Phe Thr Ser Lys 
235 

Leu Pro Lys Asn 
250 

Lys Gin Ala Phe 
265 

Gly Val Arg He 



Asn Pro Glu Gin 
300 

Gly Glu Glu Met 
315 

Leu Asn Thr Leu 
330 

Val Ser Ser Val 
345 

Val Val Met Pro 



Glu Val Glu Phe 
125 

Ser Thr Gin Phe 



Gly Met Leu Phe 
160 

Asp Gly His Arg 
175 

Pro Gly His Ser 
190 

Arg Leu Leu Asp 
205 

Ser Asn Asn Leu 



Leu Val Asp Gly 
240 

Pro Thr Lys Thr 
255 

Ser Arg Ala Ala 
270 

Asn Leu Thr Asn 
285 

Glu Glu Ala Glu 



Glu He Gly Phe 
320 

Lys Cys Glu Glu 
335 

Gin Val Glu Asn 
350 

Met Arg Leu 
365 



<210> 111 
<211> 366 
<212> PRT 

<213> Haemophilus influenzae 
<400> 111 

Met Gin Phe Ser lie Ser Arg Glu Asn Leu Leu Lys Pro Leu Gin Gin 



Val Cys Gly Val Leu Ser Asn Arg Pro Asn lie Pro Val Leu Asn Asn 
20 25 30 

Val Leu Leu Gin lie Glu Asp Tyr Arg Leu Thr lie Thr Gly Thr Asp 
35 40 45 

Leu Glu Val Glu Leu Ser Ser Gin Thr Gin Leu Ser Ser Ser Ser Glu 
50 55 60 

Asn Gly Thr Phe Thr lie Pro Ala Lys Lys Phe Leu Asp He Cys Arg 
65 70 75 80 

Thr Leu Ser Asp Asp Ser Glu He Thr Val Thr Phe Glu Gin Asp Arg 



Ala Leu Val Gin Ser Gly Arg Ser Arg Phe Thr Leu Ala Thr Gin Pro 
100 105 HO 

Ala Glu Glu Tyr Pro Asn Leu Thr Asp Trp Gin Ser Glu Val Asp Phe 
115 120 125 

Glu Leu Pro Gin Asn Thr Leu Arg Arg Leu He Glu Ala Thr Gin Phe 
130 135 140 

Ser Met Ala Asn Gin Asp Ala Arg Tyr Phe Leu Asn Gly Met Lys Phe 
145 150 155 160 

Glu Thr Glu Gly Asn Leu Leu Arg Thr Val Ala Thr Asp Gly His Arg 
165 170 175 

Leu Ala Val Cys Thr He Ser Leu Glu Gin Glu Leu Gin Asn His Ser 
180 185 190 

Val He Leu Pro Arg Lys Gly Val Leu Glu Leu Val Arg Leu Leu Glu 
195 200 205 

Thr Asn Asp Glu Pro Ala Arg Leu Gin He Gly Thr Asn Asn Leu Arg 
210 215 220 
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Val His Leu Lys 
225 

Phe Pro Asp Tyr 



Glu Gly Asn Trp 
260 

Leu Ser Asn Glu 
275 

Gin Leu Lys lie 
290 

He Val Asp Val 
305 

Val Thr Tyr He 



Arg Met Cys Leu 
340 

Glu Asp Ser Ser 
355 



Asn Thr Val Phe 
230 

Arg Arg Val Leu 
245 

Glu Met Leu Lys 



Arg Ala Arg Ser 
280 

Thr Ala Ser Asn 
295 

Asn Tyr Asn Gly 
310 

Leu Asp Val Leu 
325 

Thr Asp Ala Phe 



Cys Glu Tyr Val 
360 



Thr Ser Lys Leu 
235 

Pro Arg Asn Ala 
250 

Gin Ala Phe Ala 
265 

Val Arg Leu Ser 



Thr Glu His Glu 
300 

Glu Glu Leu Glu 
315 

Asn Ala Leu Lys 
330 

Ser Ser Cys Leu 
345 

He Met Pro Met 



lie Asp Gly Arg 
240 

Thr Lys He Val 
255 

Arg Ala Ser He 
270 

Leu Lys Glu Asn 
285 

Glu Ala Glu Glu 



Val Gly Phe Asn 
320 

Cys Asn Gin Val 
335 

He Glu Asn Cys 
350 

Arg Leu 
365 



<210> 112 
<211> 367 
<212> PRT 

<213> Pseudomonas putida 
<400> 112 

Met His Phe Thr He Gin Arg Glu Ala Leu Leu Lys Pro Leu Gin Leu 
15 10 15 

Val Ala Gly Val Val Glu Arg Arg Gin Thr Leu Pro Val Leu Ser Asn 
20 25 30 

Val Leu Leu Val Val Gin Gly Gin Gin Leu Ser Leu Thr Gly Thr Asp 
35 40 45 

Leu Glu Val Glu Leu Val Gly Arg Val Gin Leu Glu Glu Pro Ala Glu 
50 55 60 

Pro Gly Glu He Thr Val Pro Ala Arg Lys Leu Met Asp He Cys Lys 
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65 70 75 bu 

Ser Leu Pro Asn Asp Ala Leu He Asp He Lys Val Asp Glu Gin Lys 



Leu Leu Val Lys Ala Gly Arg Ser Arg Phe Thr Leu Ser Thr Leu Pro 
100 105 HO 

Ala Asn Asp Phe Pro Thr Val Glu Glu Gly Pro Gly Ser Leu Thr Cys 
115 120 125 

Asn Leu Glu Gin Ser Lys Leu Arg Arg Leu He Glu Arg Thr Ser Phe 
130 135 140 

Ala Met Ala Gin Gin Asp Val Arg Tyr Tyr Leu Asn Gly Met Leu Leu 
145 150 155 160 

Glu Val Ser Arg Asn Thr Leu Arg Ala Val Ser Thr Asp Gly His Arg 
165 170 175 

Leu Ala Leu Cys Ser Met Ser Ala Pro He Glu Gin Glu Asp Arg His 
180 185 190 

Gin Val He Val Pro Arg Lys Gly He Leu Glu Leu Ala Arg Leu Leu 
195 200 205 

Thr Asp Pro Glu Gly Met Val Ser He Val Leu Gly Gin His His He 
210 215 220 

Arg Ala Thr Thr Gly Glu Phe Thr Phe Thr Ser Lys Leu Val Asp Gly 
225 230 235 240 

Lys Phe Pro Asp Tyr Glu Arg Val Leu Pro Lys Gly Gly Asp Lys Leu 
245 250 255 

Val Val Gly Asp Arg Gin Ala Leu Arg Glu Ala Phe Ser Arg Thr Ala 
260 265 270 

He Leu Ser Asn Glu Lys Tyr Arg Gly He Arg Leu Gin Leu Ala Ala 
275 280 285 

Gly Gin Leu Lys He Gin Ala Asn Asn Pro Glu Gin Glu Glu Ala Glu 
290 295 300 

Glu Glu He Ser Val Asp Tyr Glu Gly Ser Ser Leu Glu He Gly Phe 
305 310 315 320 

Asn Val Ser Tyr Leu Leu Asp Val Leu Gly Val Met Thr Thr Glu Gin 



325 

Val Arg Leu lie Leu Ser Asp Ser 
340 

Ala Gly Asn Asp Asp Ser Ser Tyr 
355 360 



330 335 

Asn Ser Ser Ala Leu Leu Gin Glu 
345 350 

Val Val Met Pro Met Arg Leu 
365 



<210> 113 
<211> 366 
<212> PRT 

<213> Buchnera aphidicola 
<400> 113 

Met Lys Phe Thr He Gin Asn Asp He Leu Thr Lys Asn Leu Lys Lys 
1 5 10 15 

He Thr Arg Val Leu Val Lys Asn He Ser Phe Pro He Leu Glu Asn 
20 25 30 

He Leu He Gin Val Glu Asp Gly Thr Leu Ser Leu Thr Thr Thr Asn 
35 40 45 

Leu Glu He Glu Leu He Ser Lys He Glu He He Thr Lys Tyr He 
50 55 SO 

Pro Gly Lys Thr Thr He Ser Gly Arg Lys He Leu Asn He Cys Arg 
65 70 75 80 

Thr Leu Ser Glu Lys Ser Lys He Lys Met Gin Leu Lys Asn Lys Lys 
85 90 95 

Met Tyr He Ser Ser Glu Asn Ser Asn Tyr He Leu Ser Thr Leu Ser 
100 105 HO 

Ala Asp Thr Phe Pro Asn His Gin Asn Phe Asp Tyr He Ser Lys Phe 
115 120 125 

Asp He Ser Ser Asn He Leu Lys Glu Met He Glu Lys Thr Glu Phe 
130 135 140 

Ser Met Gly Lys Gin Asp Val Arg Tyr Tyr Leu Asn Gly Met Leu Leu 
145 150 155 160 

Glu Lys Lys Asp Lys Phe Leu Arg Ser Val Ala Thr Asp Gly Tyr Arg 
165 170 175 
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Leu Ala He Ser Tyr Thr Gin Leu Lys Lys Asp He Asn Phe Phe Ser 
180 185 190 

He He He Pro Asn Lys Ala Val Met Glu Leu Leu Lys Leu Leu Asn 
195 200 205 

Thr Gin Pro Gin Leu Leu Asn He Leu He Gly Ser Asn Ser He Arg 
210 215 220 

He Tyr Thr Lys Asn Leu He Phe Thr Thr Gin Leu He Glu Gly Glu 
225 230 235 240 

Tyr Pro Asp Tyr Lys Ser Val Leu Phe Lys Glu Lys Lys Asn Pro He 
245 250 255 

He Thr Asn Ser He Leu Leu Lys Lys Ser Leu Leu Arg Val Ala He 
260 265 270 

Leu Ala His Glu Lys Phe Cys Gly He Glu He Lys He Glu Asn Gly 
275 280 285 

Lys Phe Lys Val Leu Ser Asp Asn Gin Glu Glu Glu Thr Ala Glu Asp 
290 295 300 

Leu Phe Glu He Asp Tyr Phe Gly Glu Lys He Glu He Ser He Asn 
305 310 315 320 

Val Tyr Tyr Leu Leu Asp Val He Asn Asn He Lys Ser Glu Asn He 
325 330 335 

Ala Leu Phe Leu Asn Lys Ser Lys Ser Ser He Gin He Glu Ala Glu 
340 345 350 

Asn Asn Ser Ser Asn Ala Tyr Val Val Met Leu Leu Lys Arg 
355 360 365 



<210> 114 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 114 

gtgtggatcc tcgtccccct catgcgcgac caggaaggg 
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<210> 115 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 115 

gtgtggatcc gtggtgacct tagccac 



<210> 116 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 116 

ttcgtgtccg aggaccttgt ggtccacaac 



<210> 117 
<211> 3514 
<212> DNA 

<213> Aquifex aeolicus 
<400> 117 

atgagtaagg atttcgtcca ccttcacctg 
ataaagatag acgagctcgt gaaaaaggca 
tcagaccacg gaaacctctt cggttcgtat 
attaagccca taatcggcat ggaagcctac 
actaaaacga gcgaggacaa cataaccgac 
aaggacgaaa aggtctaaag aacttaatga 
tttactacaa acccagaatt gattacgaac 
cccttaccgc atgcctgaaa ggtgttccca 
aggcggagga atgggtaaag aagttcaagg 
ttcaagcgaa caacattcca gaacaggaag 
aaaagtacga tgtgaaactc atagcgacgc 
ggtacgccca cacggttctt atggcacttc 
cgggaaactt caagtgttca aacgaagacc 
aaaagtttga aggtaagttc gaaggctggg 
tggaaaagac agcggacagc tttgagatat 
acgacgttcc gcccgacaaa acccttgagg 
taagacagag gatagaaagg ggacaagcta 



cacacccagt tctcactcct ggacggggct 6 0 
aaggagtatg gatacaaagc tgtcggaatg 12 0 
aaattctaca aagccctgaa ggcggaagga 180 
tttaccacgg gttcgaggtt tgacagaaag 24 0 
aagtacaacc accacctcat acttatagca 300 
agctctcaac cctcgcctac aaagaaggtt 360 
tccttgaaaa gtacggggag ggcctaatag 42 0 
cctactacgc ttctataaac gaagtgaaaa 480 
atatattcgg agatgacctt tatttagaac 540 
tggcaaacag gaacttaata gagatagcca 600 
aggacgccca ctacctcaat cccgaagaca 660 
aaatgaaaaa gaccattcac gaactgagtt 7 20 
ttcactttgc tccacccgag tacatgtgga 7 80 
aaaaggcact cctgaacact ctcgaggtaa 840 
ttgaaaactc cacctacctc cttcccaagt 9 00 
aatacctcag agaactcgcg tacaaaggtt 960 
aggatactaa agagtactgg gagaggctcg 1020 
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agtacgaact ggaagttata aacaaaatgg gctttgcggg atacttcttg atagttcagg 10 80 
acttcataaa ctgggctaag aaaaacgaca tacctgttgg acccggaagg ggaagtgctg 1140 
gaggttccct cgtcgcatac gccatcggaa taacggacgt tgaccctata aagcacggat 12 00 
tcctttttga gaggttctta aaccccgaaa gggtttccat gccggatata gacgtggatt 12 60 
tctgtcagga caacagggaa aaggtcatag agtacgtaag gaacaagtac ggacacgaca 13 2 0 
acgtagctca gataatcacc tacaacgtaa tgaaggcgaa gcaaacactg agagacgtcg 13 80 
caagggccat gggactcccc tactccaccg cggacaaact cgcaaaactc attcctcagg 1440 
gggacgttca gggaacgtgg ctcagtctgg aagagatgta caaaacgcct gtggaggaac 150 0 
tccttcagaa gtacggagaa cacagaacgg acatagagga caacgtaaag aagttcagac 15 6 0 
agatatgcga agaaagtccg gagataaaac agctcgttga gacggccctg aagcttgaag 162 0 
gtctcacgag acacacctcc ctccacgccg cgggagtggt tatagcacca aagcccttga 1680 
gcgagctcgt tcccctctac tacgataaag agggcgaagt cgcaacccag tacgacatgg 1740 
ttcagctcga agaactcggt ctcctgaaga tggacttcct cggactcaaa accctcacag 1800 
aactgaaact catgaaagaa ctcataaagg aaagacacgg agtggatata aacttccttg 1860 
aacttcccct tgacgacccg aaagtttaca aactccttca ggaaggaaaa accacgggag 1920 
tgttccagct cgaaagcagg ggaatgaaag aactcctgaa gaaactaaag cccgacagct 1980 
ttgacgacat cgttgcggtc ctcgcactct acagacccgg acctctaaag agcggactcg 2 04 0 
ttgacacata cattaagaga aagcacggaa aagaacccgt tgagtacccc ttcccggagc 2100 
ttgaacccgt ccttaaggaa acctacggag taatcgttta tcaggaacag gtgatgaaga 2160 
tgtctcagat actttccggc tfctactcccg gagaggcgga taccctcaga aaggcgatag 2220 
gtaagaagaa agcggattta atggctcaga tgaaagacaa gttcatacag ggagcggtgg 2280 
aaaggggata ccctgaagaa aagataagga agctctggga agacatagag aagttcgctt 234 0 
cctactcctt caacaagtct cactcggtag cttacgggta catctcctac tggaccgcct 2400 
acgttaaagc ccactatccc gcggagttct tcgcggtaaa actcacaact gaaaagaacg 2460 
acaacaagtt cctcaacctc ataaaagacg ctaaactctt cggatttgag atacttcccc 2 52 0 
ccgacataaa caagagtgat gtaggattta cgatagaagg tgaaaacagg ataaggttcg 2 5 80 
ggcttgcgag gataaaggga gtgggagagg aaactgctaa gataatcgtt gaagctagaa 2 640 
agaagtataa gcagttcaaa gggcttgcgg acttcataaa caaaaccaag aacaggaaga 27 00 
taaacaagaa agtcgtggaa gcactcgtaa aggcaggggc ttttgacttt actaagaaaa 27 60 
agaggaaaga actactcgct aaagtggcaa actctgaaaa agcattaatg gctacacaaa 2 820 
actccctttt cggtgcaccg aaagaagaag tggaagaact cgacccctta aagcttgaaa 2 880 
aggaagttct cggtttttac atttcagggc acccccttga caactacgaa aagctcctca 2940 
agaaccgcta cacacccatt gaagatfctag aagagtggga caaggaaagc gaagcggtgc 3000 
ttacaggagt tatcacggaa ctcaaagtaa aaaagacgaa aaacggagat tacatggcgg 3 060 
tcttcaacct cgttgacaag acgggactaa tagagtgtgt cgtcttcccg ggagtttacg 312 0 
aagaggcaaa ggaactgata gaagaggaca gagtagtggt agtcaaaggt tttctggacg 3180 
aggaccttga aacggaaaat gtcaagttcg tggtgaaaga ggttttctcc cctgaggagt 324 0 
tcgcaaagga gatgaggaat accctttata tattcttaaa aagagagcaa gccctaaacg 3300 
gcgttgccga aaaactaaag ggaattattg aaaacaacag gacggaggac ggatacaact 3360 
tggttctcac ggttgatctg ggagactact tcgttgattt agcactccca caagatatga 3420 
aactaaaggc tgacagaaag gttgtagagg agatagaaaa actgggagtg aaggtcataa 3480 
tttagtaaat aacccttact tccgagtagt cccc 3514 



<210> 118 
<211> 1161 
<212> PRT 

<213> Aquifex aeolicus 



83 



<400> 118 

Met Ser Lys Asp Phe Val His Leu His Leu His Thr Gin Phe Ser Leu 
1 5 10 15 

Leu Asp Gly Ala lie Lys He Asp Glu Leu Val Lys Lys Ala Lys Glu 
20 25 30 

Tyr Gly Tyr Lys Ala Val Gly Met Ser Asp His Gly Asn Leu Phe Gly 
35 40 45 

Ser Tyr Lys Phe Tyr Lys Ala Leu Lys Ala Glu Gly He Lys Pro He 
50 55 60 

He Gly Met Glu Ala Tyr Phe Thr Thr Gly Ser Arg Phe Asp Arg Lys 
65 70 75 80 

Thr Lys Thr Ser Glu Asp Asn He Thr Asp Lys Tyr Asn His His Leu 
85 90 95 

He Leu He Ala Lys Asp Asp Lys Gly Leu Lys Asn Leu Met Lys Leu 
100 105 HO 

Ser Thr Leu Ala Tyr Lys Glu Gly Phe Tyr Tyr Lys Pro Arg He Asp 
115 120 125 

Tyr Glu Leu Leu Glu Lys Tyr Gly Glu Gly Leu He Ala Leu Thr Ala 
130 135 140 

Cys Leu Lys Gly Val Pro Thr Tyr Tyr Ala Ser He Asn Glu Val Lys 
145 150 155 160 

Lys Ala Glu Glu Trp Val Lys Lys Phe Lys Asp He Phe Gly Asp Asp 
165 170 175 

Leu Tyr Leu Glu Leu Gin Ala Asn Asn He Pro Glu Gin Glu Val Ala 
180 185 190 

Asn Arg Asn Leu He Glu He Ala Lys Lys Tyr Asp Val Lys Leu He 
195 200 205 

Ala Thr Gin Asp Ala His Tyr Leu Asn Pro Glu Asp Arg Tyr Ala His 
210 215 220 

Thr Val Leu Met Ala Leu Gin Met Lys Lys Thr He His Glu Leu Ser 
225 230 235 240 



Ser 



Gly Asn Phe Lys Cys Ser Asn Glu Asp Leu His Phe Ala Pro Pro 
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Glu Tyr Met Trp Lys Lys Phe Glu Gly Lys Phe Glu Gly Trp Glu Lys 
260 265 270 

Ala Leu Leu Asn Thr Leu Glu Val Met Glu Lys Thr Ala Asp Ser Phe 
275 280 285 

Glu He Phe Glu Asn Ser Thr Tyr Leu Leu Pro Lys Tyr Asp Val Pro 
290 295 300 

Pro Asp Lys Thr Leu Glu Glu Tyr Leu Arg Glu Leu Ala Tyr Lys Gly 
305 310 315 320 

Leu Arg Gin Arg He Glu Arg Gly Gin Ala Lys Asp Thr Lys Glu Tyr 
325 330 335 

Trp Glu Arg Leu Glu Tyr Glu Leu Glu Val He Asn Lys Met Gly Phe 
340 345 350 

Ala Gly Tyr Phe Leu He Val Gin Asp Phe He Asn Trp Ala Lys Lys 
355 360 365 

Asn Asp He Pro Val Gly Pro Gly Arg Gly Ser Ala Gly Gly Ser Leu 
370 375 380 

Val Ala Tyr Ala He Gly He Thr Asp Val Asp Pro He Lys His Gly 
385 390 395 400 

Phe Leu Phe Glu Arg Phe Leu Asn Pro Glu Arg Val Ser Met Pro Asp 
405 410 415 

He Asp Val Asp Phe Cys Gin Asp Asn Arg Glu Lys Val He Glu Tyr 
420 425 430 

Val Arg Asn Lys Tyr Gly His Asp Asn Val Ala Gin He He Thr Tyr 
435 440 445 

Asn Val Met Lys Ala Lys Gin Thr Leu Arg Asp Val Ala Arg Ala Met 
450 455 460 

Gly Leu Pro Tyr Ser Thr Ala Asp Lys Leu Ala Lys Leu He Pro Gin 
465 470 475 480 

Gly Asp Val Gin Gly Thr Trp Leu Ser Leu Glu Glu Met Tyr Lys Thr 
485 490 495 

Pro Val Glu Glu Leu Leu Gin Lys Tyr Gly Glu His Arg Thr Asp He 



500 



505 



510 



Glu Asp Asn Val Lys Lys Phe Arg Gin He Cys Glu Glu Ser Pro Glu 
515 520 525 

He Lys Gin Leu Val Glu Thr Ala Leu Lys Leu Glu Gly Leu Thr Arg 
530 535 540 

His Thr Ser Leu His Ala Ala Gly Val Val He Ala Pro Lys Pro Leu 
545 550 555 560 

Ser Glu Leu Val Pro Leu Tyr Tyr Asp Lys Glu Gly Glu Val Ala Thr 
565 570 575 

Gin Tyr Asp Met Val Gin Leu Glu Glu Leu Gly Leu Leu Lys Met Asp 
580 585 590 

Phe Leu Gly Leu Lys Thr Leu Thr Glu Leu Lys Leu Met Lys Glu Leu 
595 600 605 

He Lys Glu Arg His Gly Val Asp He Asn Phe Leu Glu Leu Pro Leu 
610 615 620 

Asp Asp Pro Lys Val Tyr Lys Leu Leu Gin Glu Gly Lys Thr Thr Gly 
625 630 635 640 

Val Phe Gin Leu Glu Ser Arg Gly Met Lys Glu Leu Leu Lys Lys Leu 
645 650 655 

Lys Pro Asp Ser Phe Asp Asp He Val Ala Val Leu Ala Leu Tyr Arg 
660 665 670 

Pro Gly Pro Leu Lys Ser Gly Leu Val Asp Thr Tyr He Lys Arg Lys 
675 680 685 

His Gly Lys Glu Pro Val Glu Tyr Pro Phe Pro Glu Leu Glu Pro Val 
690 695 700 

Leu Lys Glu Thr Tyr Gly Val He Val Tyr Gin Glu Gin Val Met Lys 
705 710 715 720 

Met Ser Gin He Leu Ser Gly Phe Thr Pro Gly Glu Ala Asp Thr Leu 
725 730 735 

Arg Lys Ala He Gly Lys Lys Lys Ala Asp Leu Met Ala Gin Met Lys 
740 745 750 

Asp Lys Phe He Gin Gly Ala Val Glu Arg Gly Tyr Pro Glu Glu Lys 
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755 



7S0 



765 



lie Arg Lys Leu Trp Glu Asp lie Glu Lys Phe Ala Ser Tyr Ser Phe 
770 775 780 

Asn Lys Ser His Ser Val Ala Tyr Gly Tyr lie Ser Tyr Trp Thr Ala 
785 790 795 800 

Tyr Val Lys Ala His Tyr Pro Ala Glu Phe Phe Ala Val Lys Leu Thr 
805 810 815 

Thr Glu Lys Asn Asp Asn Lys Phe Leu Asn Leu lie Lys Asp Ala Lys 
820 825 830 

Leu Phe Gly Phe Glu lie Leu Pro Pro Asp lie Asn Lys Ser Asp Val 
835 840 845 

Gly Phe Thr lie Glu Gly Glu Asn Arg lie Arg Phe Gly Leu Ala Arg 
850 855 860 

lie Lys Gly Val Gly Glu Glu Thr Ala Lys lie lie Val Glu Ala Arg 
865 870 875 880 

Lys Lys Tyr Lys Gin Phe Lys Gly Leu Ala Asp Phe lie Asn Lys Thr 
885 890 895 

Lys Asn Arg Lys lie Asn Lys Lys Val Val Glu Ala Leu Val Lys Ala 
900 905 910 

Gly Ala Phe Asp Phe Thr Lys Lys Lys Arg Lys Glu Leu Leu Ala Lys 
915 920 925 

Val Ala Asn Ser Glu Lys Ala Leu Met Ala Thr Gin Asn Ser Leu Phe 
930 935 940 

Gly Ala Pro Lys Glu Glu Val Glu Glu Leu Asp Pro Leu Lys Leu Glu 
945 950 955 960 

Lys Glu Val Leu Gly Phe Tyr lie Ser Gly His Pro Leu Asp Asn Tyr 
965 970 975 

Glu Lys Leu Leu Lys Asn Arg Tyr Thr Pro lie Glu Asp Leu Glu Glu 
980 985 990 

Trp Asp Lys Glu Ser Glu Ala Val Leu Thr Gly Val lie Thr Glu Leu 
995 1000 1005 

Lys Val Lys Lys Thr Lys Asn Gly Asp Tyr Met Ala Val Phe Asn Leu 



87 



1010 1015 1020 

Val Asp Lys Thr Gly Leu lie Glu Cys Val Val Phe Pro Gly Val Tyr 
1025 1030 1035 1040 

Glu Glu Ala Lys Glu Leu lie Glu Glu Asp Arg Val Val Val Val Lys 
1045 1050 1055 

Gly Phe Leu Asp Glu Asp Leu Glu Thr Glu Asn Val Lys Phe Val Val 
1060 1065 1070 

Lys Glu Val Phe Ser Pro Glu Glu Phe Ala Lys Glu Met Arg Asn Thr 
1075 1080 1085 

Leu Tyr He Phe Leu Lys Arg Glu Gin Ala Leu Asn Gly Val Ala Glu 
1090 1095 1100 

Lys Leu Lys Gly He He Glu Asn Asn Arg Thr Glu Asp Gly Tyr Asn 
1105 IHO 1H5 1120 

Leu Val Leu Thr Val Asp Leu Gly Asp Tyr Phe Val Asp Leu Ala Leu 
1125 H30 H35 

Pro Gin Asp Met Lys Leu Lys Ala Asp Arg Lys Val Val Glu Glu He 
1140 H45 H50 

Glu Lys Leu Gly Val Lys Val He He 
1155 H60 



<210> 119 
<211> 2408 
<212> DNA 

<213> Aquifex aeolicus 
<400> 119 

atgaactacg ttcccttcgc gagaaagtac 
caggaagctc ccgtaaggat actcaaaaac 
tacctctttg ccggaccgag gggggttggg 
gctttgaact gtaaaaatcc ctccaaaggt 
gagatagaca ggggtgtgtt ccctgactta 
atagacgacg taagggcatt aaaagaagcg 
aaggtttaca taatagacga agctcacatg 
aaaaccctcg aagagccccc tcccagaact 
aaaattcttc ccacgatact ctcaaggtgt 
gaaaaagtaa tagagtatct aaaaaagata 
ggagcccttg aggttctggc tcatgcctct 
ctggaccagg cgagcgttta cggggaaggc 
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agaccgaaat tcttcaggga agtaatagga 6 0 
gctataaaaa acgacagagt ggctcacgcc 12 0 
aagacgacta ttgcaagaat tctcgcaaaa 180 
gagccctgcg gtgagtgcga aaactgcagg 240 
attgaaatgg atgccgcctc aaacaggggt 3 00 
gtcaattaca aacctataaa aggaaagtac 3 60 
ctcacgaaag aagctttcaa cgctctctta 420 
gttttcgtcc tttgtaccac ggagtacgac 480 
cagaggataa tcttctcaaa ggtaagaaag 540 
tgtgaaaagg aagggattga gtgcgaagag 600 
gaagggtgca tgagggatgc agcctctctc 66 0 
agggtaacaa aagaagtagt ggagaacttc 720 



ctcggaattc tcagtcagga aagcgttagg agttttctga aattgcttct gaactcagaa 7 80 
gtggacgaag ctataaagtt cctcagagaa ctctcagaaa agggctacaa cctgaccaag 84 0 
ttttgggaga tgttagaaga ggaagtgaga aacgcaattt tagtaaagag cctgaaaaat 900 
cccgaaagcg tggttcagaa ctggcaggat tacgaagact tcaaagacta ccctctggaa 9 60 
gccctcctct acgttgagaa cctgataaac aggggtaaag ttgaagcgag aacgagagaa 10 20 
cccttaagag cctttgaact cgcggtaata aagagcctta tagtcaaaga cataattccc 10 8 0 
gtatcccagc tcggaagtgt ggtaaaggaa accaaaaagg aagaaaagaa agttgaagta 114 0 
aaagaagagc caaaagtaaa agaagaaaaa ccaaaggagc aggaagagga caggttccag 12 00 
aaagttttaa acgctgtgga cggcaaaatc cttaaaagaa tacttgaagg ggcaaaaagg 12 6 0 
gaagaaagag acggaaaaat cgtcctaaag atagaagcct cttatctgag aaccatgaaa 1320 
aaggaatttg actcactaaa ggagactttt ccttttttag agtttgaacc cgtggaggat 1380 
aaaaaaaaac ctcagaagtc cagcgggacg aggctgtttt aaaggtaaag gagctcttca 144 0 
atgcaaaaat actcaaagta cgaagtaaaa gctaaggtca taaaggtgag aatgcccgtg 1500 
gaagagatag ggctgtttaa cgcactaata gacggcttgc ccaggtacgc actcacgagg 156 0 
acgaaggaaa agggaaaggg agaagttttc gttttagcga ctccttataa agtcaaggaa 162 0 
ttgatggaag ctatggaggg tatgaaaaaa cacataaagg atttagaaat cctcggagag 1680 
acggatgagg atttaacttt ttaaagtatg ggtgtatctg agcaaaggtt taagctaaaa 1740 
acaaacctga aacccgcagg ggaccagccg aaagccataa aaaaactcct tgaaaaccta 1800 
aggaaaggcg taaaagaaca aacacttctc ggagtcacgg gaagcggaaa gacttttact 1860 
ctagcaaacg taatagcgaa gtacaacaaa ccaactcttg tggtagttca caacaaaatt 1920 
ctcgcggcac agctatacag ggagtttaaa gaactattcc ctgaaaacgc tgtagagtac 1980 
tttgtctctt actacgacta ttaccaacct gaagcctaca ttcccgaaaa agatttatac 2040 
atagaaaagg acgcgagtat aaacgaaagc tggaacgfctt cagacactcc gccacgatat 2100 
ccgttctaga aaggagggac gttatagtag ttgcttcagt ttcttgcata tacggactcg 216 0 
ggaaacctga gcactacgaa aacctgagga taaaactcca aaggggaata agactgaact 222 0 
tgagtaagct cctgaggaaa ctcgttgagc taggatatca gagaaatgac tttgccataa 22 80 
agagggctac cttctcggtt aggggagacg tggttgagat agtcccttct cacacggaag 2340 
attacctcgt gagggtagag ttctgggacg acgaagttga aagaatagtc ctcatggacg 240 0 

2408 

ctctgaac 



<210> 120 
<211> 473 
<212> PRT 

<213> Aquifex aeolicus 
<400> 120 

Met Asn Tyr Val Pro Phe Ala Arg Lys Tyr Arg Pro Lys Phe Phe Arg 
1 5 10 3-5 



Glu 



Val lie Gly Gin Glu Ala Pro Val Arg He Leu Lys Asn Ala He 



Lys Asn Asp Arg Val Ala His Ala Tyr Leu Phe Ala Gly Pro Arg Gly 
35 40 45 

Val Gly Lys Thr Thr He Ala Arg He Leu Ala Lys Ala Leu Asn Cys 
50 55 60 
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Lys Asn Pro Ser Lys Gly Glu Pro Cys Gly Glu Cys Glu Asn Cys Arg 
65 70 75 80 



Glu lie Asp Arg Gly Val Phe Pro Asp Leu lie Glu Met Asp Ala Ala 



Ser Asn Arg Gly lie Asp Asp Val Arg Ala Leu Lys Glu Ala Val Asn 
100 105 110 

Tyr Lys Pro lie Lys Gly Lys Tyr Lys Val Tyr lie lie Asp Glu Ala 
115 120 125 

His Met Leu Thx Lys Glu Ala Phe Asn Ala Leu Leu Lys Thr Leu Glu 
130 135 140 

Glu Pro Pro Pro Arg Thr Val Phe Val Leu Cys Thr Thr Glu Tyr Asp 
145 150 155 160 

Lys lie Leu Pro Thr lie Leu Ser Arg Cys Gin Arg lie lie Phe Ser 
165 170 175 

Lys Val Arg Lys Glu Lys Val lie Glu Tyr Leu Lys Lys lie Cys Glu 
180 185 190 

Lys Glu Gly lie Glu Cys Glu Glu Gly Ala Leu Glu Val Leu Ala His 
195 200 205 

Ala Ser Glu Gly Cys Met Arg Asp Ala Ala Ser Leu Leu Asp Gin Ala 
210 215 220 

Ser Val Tyr Gly Glu Gly Arg Val Thr Lys Glu Val Val Glu Asn Phe 
225 230 235 240 

Leu Gly lie Leu Ser Gin Glu Ser Val Arg Ser Phe Leu Lys Leu Leu 
245 250 255 

Leu Asn Ser Glu Val Asp Glu Ala He Lys Phe Leu Arg Glu Leu Ser 
260 265 270 

Glu Lys Gly Tyr Asn Leu Thr Lys Phe Trp Glu Met Leu Glu Glu Glu 
275 280 285 

Val Arg Asn Ala lie Leu Val Lys Ser Leu Lys Asn Pro Glu Ser Val 
290 295 300 

Val Gin Asn Trp Gin Asp Tyr Glu Asp Phe Lys Asp Tyr Pro Leu Glu 
305 310 315 320 
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Ala Leu Leu Tyr Val Glu Asn Leu lie Asn Arg Gly Lys Val Glu Ala 
325 330 335 



Arg Thr Arg Glu 
340 

Leu He Val Lys 
355 

Lys Glu Thr Lys 
370 

Lys Val Lys Glu 
385 

Lys Val Leu Asn 



Gly Ala Lys Arg 
420 

Ala Ser Tyr Leu 
435 

Thr Phe Pro Phe 
450 

Gin Lys Ser Ser 
465 



Pro Leu Arg Ala 



Asp He lie Pro 
360 

Lys Glu Glu Lys 
375 

Glu Lys Pro Lys 
390 

Ala Val Asp Gly 
405 

Glu Glu Arg Asp 



Arg Thr Met Lys 
440 

Leu Glu Phe Glu 
455 

Gly Thr Arg Leu 
470 



Phe Glu Leu Ala 
345 

Val Ser Gin Leu 



Lys Val Glu Val 
380 

Glu Gin Glu Glu 
395 

Lys He Leu Lys 
410 

Gly Lys He Val 
425 

Lys Glu Phe Asp 



Pro Val Glu Asp 
460 

Phe 



Val He Lys Ser 
350 

Gly Ser Val Val 
365 

Lys Glu Glu Pro 



Asp Arg Phe Gin 
400 

Arg He Leu Glu 
415 

Leu Lys He Glu 
430 

Ser Leu Lys Glu 
445 

Lys Lys Lys Pro 



<210> 121 
<211> 1090 
<212> DNA 

<213> Aquifex aeolicus 
<400> 121 

atgcgcgtta aggtggacag ggaggagctt 
acggaaaaaa aagccgcact cccgatactc 
aacttaatcg taagggcaac ggacttggaa 
gttgaagagg aaggagaggt ttgcgtccac 
ttaaattccg cttacgttta ccttcatacg 
aagagtacgt acaaacttcc gacagctccc 
gtagaaggag gagaaacact ttcgggaaac 
tacgccatag cgaaggaaga agcgaacata 
gaggacagaa ttcactttgt gttcggacgg 
taaacattga aaagagtgaa gacgagtctt 



gaagaggttc ttaaaaaagc aagagaaagc 60 
gcgaacttct tactctccgc aaaagaggaa 12 0 
aactaccttg tagtctccgt aaagggggag 180 
tctcaaaaac tctacgatat agtcaagaac 240 
gaaggtgaaa aactcgtcat aacgggagga 3 00 
gcggaggact ttcccgaatt tccagaaatc 3 60 
cttctcgtta acggaataga aaaggtagag 420 
gcccttcagg gaatgtatct gagaggatac 4 80 
tcacaggctt gcactttatg aacctctacg 540 
ttgcttactt ctccactccc gagtggaaac 60 0 
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tcgccgttag ctcctggaag gagaattccc 
ttcggcggaa gtcttgtttg agacagagga 
tttaagcgaa ggaaaagttt ttcccgtgaa 
tgagttcgcg gatccggagt tcggagaagc 
agagcccttt gagataggat tcaacggaaa 
agcgaaagag tgtggttcaa gttcacaacc 
gattacgaaa aggaacctta caagtgcata 
aagctttaat ctttttattg agcttgagcc 
ccaagtcttc 



ggactacatg agtgtcatcc ctgaggagtt 660 
agtcttaaag gttttaaaga ggttgaaggc 72 0 
gattacctta agcgaaaacc ttgccatctt 7 80 
gagagaggaa attgaagtgg agtacacggg 840 
taccttatgg aggcgcttga cgcctacgac 900 
cccgacacgg ccactttatt ggaggctgaa 960 
ataatgccga tgagggtgta gccatgaaaa 102 0 
ttttaattcc tgcgtttagc gaagccaaac 1080 
1090 



<210> 122 
<211> 363 
<212> PRT 

<213> Aquifex aeolicus 



<400> 122 

Met Arg Val Lys Val Asp Arg Glu Glu Leu Glu Glu Val Leu Lys Lys 



Ala Arg Glu Ser Thr Glu Lys Lys Ala Ala Leu Pro He Leu Ala Asn 
20 25 30 

Phe Leu Leu Ser Ala Lys Glu Glu Asn Leu He Val Arg Ala Thr Asp 
35 40 45 

Leu Glu Asn Tyr Leu Val Val Ser Val Lys Gly Glu Val Glu Glu Glu 
50 55 60 

Gly Glu Val Cys Val His Ser Gin Lys Leu Tyr Asp He Val Lys Asn 
65 70 75 80 

Leu Asn Ser Ala Tyr Val Tyr Leu His Thr Glu Gly Glu Lys Leu Val 



lie Thr Gly Gly Lys Ser Thr Tyr Lys Leu Pro Thr Ala Pro Ala Glu 
100 105 HO 

Asp Phe Pro Glu Phe Pro Glu He Val Glu Gly Gly Glu Thr Leu Ser 
115 120 125 

Gly Asn Leu Leu Val Asn Gly He Glu Lys Val Glu Tyr Ala He Ala 
130 135 140 

Lys Glu Glu Ala Asn He Ala Leu Gin Gly Met Tyr Leu Arg Gly Tyr 
145 150 155 160 



Glu Asp Arg 



He His Phe Val Gly Ser Asp Gly His Arg Leu Ala Leu 
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165 



170 



175 



Tyr Glu Pro Leu Gly Glu Phe Ser Lys Glu Leu Leu lie Pro Arg Lys 
180 185 190 

Ser Leu Lys Val Leu Lys Lys Leu lie Thr Gly lie Glu Asp Val Asn 
195 200 205 

lie Glu Lys Ser Glu Asp Glu Ser Phe Ala Tyr Phe Ser Thr Pro Glu 
210 215 220 

Trp Lys Leu Ala Val Arg Leu Leu Glu Gly Glu Phe Pro Asp Tyr Met 
225 230 235 240 

Ser Val lie Pro Glu Glu Phe Ser Ala Glu Val Leu Phe Glu Thr Glu 
245 250 255 

Glu Val Leu Lys Val Leu Lys Arg Leu Lys Ala Leu Ser Glu Gly Lys 
260 265 270 

Val Phe Pro Val Lys He Thr Leu Ser Glu Asn Leu Ala He Phe Glu 
275 280 285 

Phe Ala Asp Pro Glu Phe Gly Glu Ala Arg Glu Glu He Glu Val Glu 
290 295 300 

Tyr Thr Gly Glu Pro Phe Glu He Gly Phe Asn Gly Lys Tyr Leu Met 
305 310 315 320 

Glu Ala Leu Asp Ala Tyr Asp Ser Glu Arg Val Trp Phe Lys Phe Thr 
325 330 335 

Thr Pro Asp Thr Ala Thr Leu Leu Glu Ala Glu Asp Tyr Glu Lys Glu 
340 345 350 

Pro Tyr Lys Cys He He Met Pro Met Arg Val 
355 360 



<210> 123 
<211> 1093 
<212> DNA 

<213> Aquifex aeolicus 
<400> 123 

gtggaaacca caatattcca gttccagaaa acttttttca caaaacctcc gaaggagagg 60 
gtcttcgtcc ttcatggaga agagcagtat ctcataagaa cctttttgtc taagctgaag 120 
gaaaagtacg gggagaatta cacggttctg tggggggatg agataagcga ggaggaattc 180 



93 



tacactgccc tttccgagac cagtatattc 
tacaacttcg gggatttcct gaagaagctc 
ataaaagtcc tcagaaacgt aaagagtaac 
cagaaacagg aactttcttc ggaacctctg 
gtagcaaaca ggctgagcaa ggagaggata 
aaagggataa acgtagaaaa cgatgccctt 
ttgatggagc tcaaacttga ggttgaaaaa 
ttaacactcg atgaggtaaa gagagtagcc 
gagttcgttg atttactcct cttaaaagat 
ctcatttcct tcggaataca ccccctccag 
aaactttaca ccctcaagag gcttgaagag 
agcgtgggaa taaagaacaa ctttctcaag 
tctaaagagg acttgaagaa cctaatcctc 
ctttactttc aggacacagt gcagttgctg 
aagttgtgaa aaatacttct catggtggat 
ttttcccggt tct 



ggcggttcaa aggaaaaagc ggfcggtcatfc 24 0 
ggaaggaaga aaaaggaaaa agaaaggctt 3 00 
tacgtattta tagtgtacga tgcgaaactc 3 60 
aaatccgtag cgtctttcgg cggtatagtg 42 0 
aaacagctcg tccttaagaa gttcaaagaa 4 80 
gaataccttc tccagctcac gggttacaac 540 
ctgatagatt acgcaagtga aaagaaaatt 600 
ttctcagtct cagaaaacgt aaacgtattt 660 
tacgaaaagg ctcttaaagt tttggactcc 720 
attatgaaaa tcctgtcctc ctatgctcta 7 80 
aagggagagg acctgaataa ggcgatggaa 84 0 
atgaagttca aatcttactt aaaggcaaac 9 00 
tccctccaga ggatagacgc tttttctaaa 960 
gggatttctt gacctcaaga ctggagaggg 1020 
aatctttttt atgaagtttg cggtttgcgt 1080 
1093 



<210> 124 
<211> 350 
<212> PRT 

<213> Aquifex aeolicus 
<400> 124 

Val Glu Thr Thr lie Phe Gin Phe Gin Lys Thr Phe Phe Thr Lys Pro 



Pro Lys Glu Arg Val Phe Val Leu His Gly Glu Glu Gin Tyr Leu lie 



Arg Thr Phe Leu Ser Lys Leu Lys Glu Lys Tyr Gly Glu Asn Tyr Thr 



Val Leu Trp Gly Asp Glu lie Ser Glu Glu Glu Phe Tyr Thr Ala Leu 



Ser Glu Thr Ser lie Phe Gly Gly Ser Lys Glu Lys Ala Val Val lie 



Tyr Asn Phe Gly Asp Phe Leu Lys Lys Leu Gly Arg Lys Lys Lys Glu 



Lys Glu Arg Leu lie Lys Val Leu Arg Asn Val Lys Ser Asn Tyr Val 
100 105 110 

Phe lie Val Tyr Asp Ala Lys Leu Gin Lys Gin Glu Leu Ser Ser Glu 

115 120 125 
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Pro Leu Lys Ser Val Ala Ser Phe Gly Gly lie Val Val Ala Asn Arg 
130 135 140 

Leu Ser Lys Glu Arg lie Lys Gin Leu Val Leu Lys Lys Phe Lys Glu 
145 150 155 160 

Lys Gly lie Asn Val Glu Asn Asp Ala Leu Glu Tyr Leu Leu Gin Leu 
165 170 175 

Thr Gly Tyr Asn Leu Met Glu Leu Lys Leu Glu Val Glu Lys Leu lie 
180 185 190 

Asp Tyr Ala Ser Glu Lys Lys lie Leu Thr Leu Asp Glu Val Lys Arg 



Val Ala Phe Ser Val Ser Glu Asn Val Asn Val Phe Glu Phe Val Asp 
210 215 220 

Leu Leu Leu Leu Lys Asp Tyr Glu Lys Ala Leu Lys Val Leu Asp Ser 
225 230 235 240 

Leu He Ser Phe Gly He His Pro Leu Gin He Met Lys He Leu Ser 
245 250 255 

Ser Tyr Ala Leu Lys Leu Tyr Thr Leu Lys Arg Leu Glu Glu Lys Gly 
260 265 270 

Glu Asp Leu Asn Lys Ala Met Glu Ser Val Gly He Lys Asn Asn Phe 
275 280 285 

Leu Lys Met Lys Phe Lys Ser Tyr Leu Lys Ala Asn Ser Lys Glu Asp 
290 295 300 

Leu Lys Asn Leu He Leu Ser Leu Gin Arg He Asp Ala Phe Ser Lys 
305 310 315 320 

Leu Tyr Phe Gin Asp Thr Val Gin Leu Leu Arg Asp Phe Leu Thr Ser 
325 330 335 

Arg Leu Glu Arg Glu Val Val Lys Asn Thr Ser His Gly Gly 
340 345 350 



<210> 125 
<211> 1051 
<212> DNA 

<213> Aquifex aeolicus 
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<400> 125 

atggaaaaag tttttttgga aaaactccag aaaaccttgc acatacccgg aggactcctt 60 
ttttacggca aagaaggaag cggaaagacg aaaacagctt ttgaatttgc aaaaggtatt 12 0 
ttatgtaagg aaaacgtacc tggggatgcg gaagttgtcc ctcctgcaaa cacgtaaacg 180 
agctggagga agccttcttt aaaggagaaa tagaagactt taaagtttat aagacaagga 240 
cggtaaaaag cacttcgttt accttatggg cgaacatccc gactttgtgg taataatccc 3 00 
gagcggacat tacataaaga tagaacagat aagggaagtt aagaactttg cctatgtgaa 3 60 
gcccgcacta agcaggagaa aagtaattat aatagacgac gcccacgcga tgacctctca 420 
ggcggcaaac gctcttttaa aggtattgga agagccacct gcggacacca cctttatctt 480 
gaccacgaac aggcgttctg caatcctgcc gactatcctc tccagaactt ttcaagtgga 540 
gttcaagggc ttttcagtaa aagaggttat ggaaatagcg aaagtagacg aggaaatagc 600 
gaaactctct ggaggcagtc taaaaagggc tatcttacta aaggaaaaca aagatatcct 660 
aaacaaagta aaggaattct tggaaaacga gccgttaaaa gtttacaagc ttgcaagtga 720 
attcgaaaag tgggaacctg aaaagcaaaa actcttcctt gaaattatgg aagaattggt 7 80 
atctcaaaaa ttgaccgaag agaaaaaaga caattacacc taccttcttg atacgatcag 840 
actctttaaa gacggactcg caaggggtgt aaacgaacct ctgtggctgt ttacgttagc 900 
cgttcaggcg gattaataaa ccgttattga ttccgtaaca tttaaacctt aatctaaatt 960 
atgagagcct ttgaaggagg tctggtatgg aaaatttgaa gattagatat atagatacga 102 0 
ggaagatagg aaccgtgagc ggtgtaaaag t 1051 



<210> 126 
<211> 305 
<212> PRT 

<213> Aquifex aeolicus 
<400> 126 

Met Glu Lys Val Phe Leu Glu Lys Leu Gin Lys Thr Leu His lie Pro 



Gly Gly Leu Leu Phe Tyr Gly Lys Glu Gly Ser Gly Lys Thr Lys Thr 



Ala Phe Glu Phe Ala Lys Gly lie Leu Cys Lys Glu Asn Val Pro Trp 



Gly Cys Gly Ser Cys Pro Ser Cys Lys His Val Asn Glu Leu Glu Glu 



Ala Phe Phe Lys Gly Glu lie Glu Asp Phe Lys Val Tyr Lys Asp Lys 



Asp Gly Lys Lys His Phe Val Tyr Leu Met Gly Glu His Pro Asp Phe 



Val Val He He Pro Ser Gly His Tyr He Lys He Glu Gin He Arg 
100 105 110 
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Glu Val Lys Asn Phe Ala Tyr Val Lys Pro Ala Leu Ser Arg Arg Lys 
115 120 125 



Val He He He Asp Asp Ala His Ala Met Thr Ser Gin Ala Ala Asn 
130 135 140 

Ala Leu Leu Lys Val Leu Glu Glu Pro Pro Ala Asp Thr Thr Phe He 
145 150 155 160 

Leu Thr Thr Asn Arg Arg Ser Ala He Leu Pro Thr He Leu Ser Arg 
165 170 175 

Thr Phe Gin Val Glu Phe Lys Gly Phe Ser Val Lys Glu Val Met Glu 
180 185 190 

He Ala Lys Val Asp Glu Glu He Ala Lys Leu Ser Gly Gly Ser Leu 
195 200 205 

Lys Arg Ala He Leu Leu Lys Glu Asn Lys Asp He Leu Asn Lys Val 
210 215 220 

Lys Glu Phe Leu Glu Asn Glu Pro Leu Lys Val Tyr Lys Leu Ala Ser 
225 230 235 240 

Glu Phe Glu Lys Trp Glu Pro Glu Lys Gin Lys Leu Phe Leu Glu He 
245 250 255 

Met Glu Glu Leu Val Ser Gin Lys Leu Thr Glu Glu Lys Lys Asp Asn 
260 265 270 

Tyr Thr Tyr Leu Leu Asp Thr He Arg Leu Phe Lys Asp Gly Leu Ala 
275 280 285 

Arg Gly Val Asn Glu Pro Leu Trp Leu Phe Thr Leu Ala Val Gin Ala 
290 295 300 



Asp 
305 



<210> 127 
<211> S30 
<212> DNA 

<213> Aquifex aeolicus 
<400> 127 

atgaacttcc tgaaaaagtt ccttttactg agaaaagctc aaaagtctcc ttacttcgaa 
gagttctacg aagaaatcga tttgaaccag aaggtgaaag atgcaaggtt tgtagttttt 
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gactgcgaag ccacagaact cgacgtaaag 
gaggttaaaa acctggaaat agacctctct 
gagataaagg cggcggagat acatggaata 
gaaccaaagg aagtaatata cgactttctg 
tactacgtga agtttgacgt ctcactcgtt 
ccaatcatca actacaagtt agacctgttt 
aggagtcttg acgaccttat gaaggaactc 
cttgaagatg cctacataac cgctcttctt 
tacagactaa aggatctccc gattttcctt 



aaggcaaaac tcctttcaat aggtgcggtt 180 
aaatcttttt acgagatact caaaagtgac 240 
accagggaag acgttgaaaa gtacggaaag 300 
aagtacataa agggaagcgt tctcgttggc 36 0 
gagaagtact ccataaagta cttccagtat 420 
agtttcgtga agagagagta ccagagtggc 480 
ggtgtagaaa taagggcaag gcacaacgcc 54 0 
ttcctaaagt acgtttaccc gaacagggag 600 
630 



<210> 128 
<211> 210 
<212> PRT 

<213> Aguifex aeolicus 



<400> 128 

Met Asn Phe Leu Lys Lys Phe Leu Leu Leu Arg Lys Ala Gin Lys Ser 



Pro Tyr Phe Glu Glu Phe Tyr Glu Glu He Asp Leu Asn Gin Lys Val 
20 25 30 

Lys Asp Ala Arg Phe Val Val Phe Asp Cys Glu Ala Thr Glu Leu Asp 
35 40 45 

Val Lys Lys Ala Lys Leu Leu Ser lie Gly Ala Val Glu Val Lys Asn 
50 55 60 

Leu Glu He Asp Leu Ser Lys Ser Phe Tyr Glu He Leu Lys Ser Asp 
65 70 75 80 

Glu He Lys Ala Ala Glu He His Gly He Thr Arg Glu Asp Val Glu 



Lys Tyr Gly Lys Glu Pro Lys Glu Val lie Tyr Asp Phe Leu Lys Tyr 
100 105 HO 

He Lys Gly Ser Val Leu Val Gly Tyr Tyr Val Lys Phe Asp Val Ser 
115 120 125 

Leu Val Glu Lys Tyr Ser He Lys Tyr Phe Gin Tyr Pro He He Asn 
130 135 140 

Tyr Lys Leu Asp Leu Phe Ser Phe Val Lys Arg Glu Tyr Gin Ser Gly 
145 150 155 160 

Arg Ser Leu Asp Asp Leu Met Lys Glu Leu Gly Val Glu He Arg Ala 
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165 



170 



175 



Arg His Asn Ala Leu Glu Asp Ala 
180 

Lys Tyr Val Tyr Pro Asn Arg Glu 
195 200 



Tyr lie Thr Ala Leu Leu Phe Leu 
185 190 

Tyr Arg Leu Lys Asp Leu Pro lie 
205 



Phe Leu 
210 



<210> 129 
<211> 526 
<212> DNA 

<213> Aquifex aeolicus 
<400> 129 

atgctcaata aggtttttat aataggaaga 
ccgagcggaa cgcccgtagt agagtttact 
aacggtgaat ttcaggagga aagtcacttc 
gaagactggg ctacacgctt ctcgaaagga 
caggaaaagt gggagaaaga aggaaagaag 
gtaagattaa taaacaggcc gaaaggtgct 
cctcccattg aggaggaaat tgaaaaactc 
gaagaggacg aaataccttt ttaattttga 
aagaagaaag tttgtatgta ctgtgaacaa 



cttacgggtg accccgttat aacttatcta 6 0 
ctggcttaca acagaaggta taaaaaccag 12 0 
tttgacgtaa aggcgtacgg aaaaatggct 18 0 
tacctcgtac tcgtagaggg aagactctcc 24 0 
ttctcaaagg tcaggataat agcggaaaac 300 
gaacttcaag cagaagaaga ggaggaagtt 360 
ggtaaagagg aagagaagcc ttttaccgat 42 0 
ggaggttaaa gtatggtagt gagagctcct 480 
aagagagagc cagatt 526 



<210> 130 
<211> 147 
<212> PRT 

<213> Aquifex aeolicus 



<400> 130 

Met Leu Asn Lys Val Phe lie lie 
1 5 

lie Thr Tyr Leu Pro Ser Gly Thr 
20 

Tyr Asn Arg Arg Tyr Lys Asn Gin 
35 40 

His Phe Phe Asp Val Lys Ala Tyr 
50 55 

Thr Arg Phe Ser Lys Gly Tyr Leu 
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Gly Arg Leu Thr Gly Asp Pro Val 
10 15 

Pro Val Val Glu Phe Thr Leu Ala 
25 30 

Asn Gly Glu Phe Gin Glu Glu Ser 
45 

Gly Lys Met Ala Glu Asp Trp Ala 
60 

Val Leu Val Glu Gly Arg Leu Ser 



65 70 

Gin Glu Lys Trp Glu Lys Glu Gly 
85 

He Ala Glu Asn Val Arg Leu He 
100 

Gin Ala Glu Glu Glu Glu Glu Val 
115 120 

Lys Leu Gly Lys Glu Glu Glu Lys 
130 135 



75 80 

Lys Lys Phe Ser Lys Val Arg He 
90 95 

Asn Arg Pro Lys Gly Ala Glu Leu 
105 HO 

Pro Pro He Glu Glu Glu He Glu 
12 5 

Pro Phe Thr Asp Glu Glu Asp Glu 
140 



He Pro Phe 
145 



<210> 131 
<211> 1472 
<212> DNA 

<213> Aquifex aeolicus 
<400> 131 

atgcaatttg tggataaact tccctgtgac 
atgcttgaag accccgaaaa catacctctg 
tgcatagacg agcacaagct acttttcagg 
aataagctcg atttcgtatt aataaaggat 
atacctatag actggctcga agaactctac 
gaagtctgca aaatagtaaa acaacgttcc 
gaactcattc acaaaggaaa ggaaaacaaa 
agcaggatat tttccatagc ggaaagtgct 
gttgcggaag aagttataga actcatttat 
ggactcccaa gcggtttcac ggaactcgat 
ttaataatac tcgccgcaag acccggtatg 
tacaatctcg caaaagacga gggaaaaccc 
gaacagctcg ttatgagact cctctctatg 
tctggaagta tatcgaatga agatttaaag 
aagtacgaca tatacctcga cgacacaccc 
gcaagaaagc tcagaaagga aaaggaagtt 
ctgagaccgc cagtccgaaa gagttcaaga 
ttaaaagccc ttgcaaagga acttcacatt 
gaggtggaaa agaggagtga taaaagaccc 
atagaacagg acgcagacct aatccttttc 
ccaaatcccg aagagcaggg tatagcggaa 
acggacattg tgaagctcgc atttattaag 
cttcctgaac aacctcctga agaagaggaa 
gaaggattcg aagatattga cttctgaaaa 



gaatccgccg agagggcggt tcttggcagt 60 
gtacttgaat accttaaaga agaagacttc 12 0 
gttcttacaa acctctggtc cgagtacggc 180 
caccttgaaa agaaaaactt actccagaaa 240 
gaggaggcgg tatcccctga cacgcttgag 3 00 
gcacagaggg cgataattca actcggtata 3 60 
gacttfccaca cattaatcga ggaagcccag 420 
acatctacgc agttttacca tgtgaaagac 480 
aaattcaaaa gctctgacag gctagtcacg 540 
ctaaagacga cgggattcca ccctggagac 600 
gggaaaaccg cctttatgct ctccataatc 660 
tcagctgtat tttccttgga aatgagcaag 72 0 
atgtcggagg tcccactttt caagataagg 7 80 
aagcttgaag caagcgcaat agaactcgca 84 0 
gctctcacta caacggattt aaggataagg 90 0 
gagttcgtgg cggtggacta cttgcaactt 960 
caggaggaag tggcagaggt ttcaagaaac 102 0 
cccgttatgg cacttgcgca gctctcccgt 1080 
cagcttgcgg acctcagaga atccggacag 114 0 
ctccacagac ccgagtacta caagaaaaag 12 0 0 
gtgataatag ccaagcaaag gcaaggaccc 12 6 0 
gagtacacta agtttgcaaa cctagaagcc 132 0 
ctttccgaaa ttattgaaac acaggaggat 1380 
ttaaggtttt ataattttat cttggctatc 1440 



100 



cggggtagct caatcggcag agcgggtggc tg 



<210> 132 
<211> 438 
<212> PRT 

<213> Aquifex aeolicus 
<400> 132 

Met Gin Phe Val Asp Lys Leu Pro Cys Asp Glu Ser Ala Glu Arg Ala 



Val Leu Gly Ser Met Leu Glu Asp Pro Glu Asn lie Pro Leu Val Leu 
20 25 30 

Glu Tyr Leu Lys Glu Glu Asp Phe Cys He Asp Glu His Lys Leu Leu 
35 40 45 

Phe Arg Val Leu Thr Asn Leu Trp Ser Glu Tyr Gly Asn Lys Leu Asp 
50 55 60 

Phe Val Leu He Lys Asp His Leu Glu Lys Lys Asn Leu Leu Gin Lys 
65 70 75 80 

He Pro He Asp Trp Leu Glu Glu Leu Tyr Glu Glu Ala Val Ser Pro 



Asp Thr Leu Glu Glu Val Cys Lys He Val Lys Gin Arg Ser Ala Gin 
100 105 HO 

Arg Ala He He Gin Leu Gly He Thr Ser Thr Gin Phe Tyr His Val 
115 120 125 

Lys Asp Val Ala Glu Glu Val He Glu Leu He Tyr Lys Phe Lys Ser 
130 135 140 

Ser Asp Arg Leu Val Thr Gly Leu Pro Ser Gly Phe Thr Glu Leu Asp 
145 150 155 160 

Leu Lys Thr Thr Gly Phe His Pro Gly Asp Leu He He Leu Ala Ala 
165 170 175 

Arg Pro Gly Met Gly Lys Thr Ala Phe Met Leu Ser He He Tyr Asn 
180 185 190 

Leu Ala Lys Asp Glu Gly Lys Pro Ser Ala Val Phe Ser Leu Glu Met 
195 200 205 
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Ser Lys Glu Gin Leu Val Met Arg Leu Leu Ser Met Met Ser Glu Val 
210 215 220 

Pro Leu Phe Lys He Arg Ser Gly Ser He Ser Asn Glu Asp Leu Lys 
225 230 235 240 

Lys Leu Glu Ala Ser Ala He Glu Leu Ala Lys Tyr Asp He Tyr Leu 
245 250 255 

Asp Asp Thr Pro Ala Leu Thr Thr Thr Asp Leu Arg He Arg Ala Arg 
260 255 270 

Lys Leu Arg Lys Glu Lys Glu Val Glu Phe Val Ala Val Asp Tyr Leu 
275 280 285 

Gin Leu Leu Arg Pro Pro Val Arg Lys Ser Ser Arg Gin Glu Glu Val 
290 295 300 

Ala Glu Val Ser Arg Asn Leu Lys Ala Leu Ala Lys Glu Leu His lie 
305 310 315 320 

Pro Val Met Ala Leu Ala Gin Leu Ser Arg Glu Val Glu Lys Arg Ser 
325 330 335 

Asp Lys Arg Pro Gin Leu Ala Asp Leu Arg Glu Ser Gly Gin He Glu 
340 345 350 

Gin Asp Ala Asp Leu He Leu Phe Leu His Arg Pro Glu Tyr Tyr Lys 
355 360 365 

Lys Lys Pro Asn Pro Glu Glu Gin Gly He Ala Glu Val He He Ala 
370 375 380 

Lys Gin Arg Gin Gly Pro Thr Asp He Val Lys Leu Ala Phe He Lys 
385 390 395 400 

Glu Tyr Thr Lys Phe Ala Asn Leu Glu Ala Leu Pro Glu Gin Pro Pro 
405 410 415 

Glu Glu Glu Glu Leu Ser Glu He He Glu Thr Gin Glu Asp Glu Gly 
420 425 430 

Phe Glu Asp He Asp Phe 
435 



<210> 133 
<211> 1526 
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<212> DNA 

<213> Aquifex aeolicus 
<400> 133 

atgtcctcgg acatagacga acttagacgg 
tacttaaact tagagaaggt aggttccaat 
gatacaccct ccttttacgt gtctccaagt 
gtagggggag acgcgataaa gttcgtttcc 
gcccttgaac tcgcaaaacg ctacggaaag 
gaaaaggtat acgtggctct tgacagggtt 
aacagagagg caagtgagta cgtaaagagt 
tttgatcttg ggtacgcacc ttccagtgaa 
cttttagagg cttaccttga aactaaaaac 
gatctctttc ttcggcgtgt cgtgatcccg 
ttcggtggaa ggaggatagt agaggacaaa 
agggtattta aaaaggggga gaacttattc 
gaagaaggat ttgcgatact tgtggaaggg 
ggaataagga acgttgttgc acccctcggt 
ctttccaagt fccacaaaaaa ggtctacatc 
gctatgaaaa gtgccattcc cctactcctc 
ctccccgaag gatacgatcc cgacgagttt 
agactgataa acagctcagg ggagctcttt 
ttagaggaga aaacgcgtga gttcaggtat 
cgctttgctc tggcttcgga gtttcacacc 
atgaaaattg aaaaaaattc tcaagaaaaa 
ttcctgaaag gactgataga attaaaacca 
cctgagttaa aggaactcgc agttaacgcc 
gaagttctcg agtaccaggt ggataacttg 
ttacaaaaat ctgggaaaaa gaggaagaaa 
actttaataa atttttagag ttagga 



gaaatagata tagtagacgt catttccgaa 60 
tacagaacga actgtccctt tcaccctgac 120 
aaacaaatat tcaagtgttt cggttgcggg 180 
ctttacgagg acatctccta ttttgaagcc 240 
aaattagacc ttgaaaagat atcaaaagac 3 00 
tgtgatttct acagggaaag ccttctcaaa 3 60 
aggggaatag accctaaagt agcgaggaag 420 
gcactcgtaa aagtcttaaa agagaacgat 480 
ctcctttctc ctacgaaggg tgtttacagg 540 
ataaaggatc cgaggggaag agttataggt 600 
tctcccaagt acataaactc tccagacagc 660 
ggtctttacg aggcaaagga gtatataaag 720 
tactttgacc ttttgagact tttttccgag 780 
acagccctga cccaaaatca ggcaaacctc 84 0 
ctttacgacg gagatgatgc gggaagaaag 900 
agtgcaggag tggaagttta tcccgtttac 96 0 
ataaaggaat tcgggaaaga ggaattaaga 1020 
gaaacgctca taaaaaccgc aagggaaaac 1080 
tatctgggct ttatttccga tggagtaagg 114 0 
aagtacaaag ttcctatgga aattttatta 1200 
gaaattaaac tctcctttaa ggaaaaaatc 12 6 0 
aaaatagacc ttgaagtcct gaacttaagt 132 0 
ttaaacggag aggagcattfc acttccaaaa 1380 
gagaaacttt ttaacaacat ccttagggat 1440 
agagggttga aaaatgtaaa tacttaatta 150 0 
1526 



<210> 134 
<211> 498 
<212> PRT 

<213> Aquifex aeolicus 
<400> 134 

Met Ser Ser Asp He Asp Glu Leu 
1 5 

Val He Ser Glu Tyr Leu Asn Leu 
20 

Thr Asn Cys Pro Phe His Pro Asp 

35 40 

Pro Ser Lys Gin He Phe Lys Cys 



Arg Arg Glu lie Asp lie Val Asp 
10 15 

Glu Lys Val Gly Ser Asn Tyr Arg 
25 30 

Asp Thr Pro Ser Phe Tyr Val Ser 
45 

Phe Gly Cys Gly Val Gly Gly Asp 
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Ala lie Lys Phe Val Ser Leu Tyr Glu Asp He Ser Tyr Phe Glu Ala 
65 70 75 80 

Ala Leu Glu Leu Ala Lys Arg Tyr Gly Lys Lys Leu Asp Leu Glu Lys 



He Ser Lys Asp Glu Lys Val Tyr Val Ala Leu Asp Arg Val Cys Asp 
100 105 HO 

Phe Tyr Arg Glu Ser Leu Leu Lys Asn Arg Glu Ala Ser Glu Tyr Val 
115 120 125 

Lys Ser Arg Gly He Asp Pro Lys Val Ala Arg Lys Phe Asp Leu Gly 
130 135 140 

Tyr Ala Pro Ser Ser Glu Ala Leu Val Lys Val Leu Lys Glu Asn Asp 
145 150 155 160 

Leu Leu Glu Ala Tyr Leu Glu Thr Lys Asn Leu Leu Ser Pro Thr Lys 
165 170 175 

Gly Val Tyr Arg Asp Leu Phe Leu Arg Arg Val Val He Pro He Lys 
180 185 190 

Asp Pro Arg Gly Arg Val He Gly Phe Gly Gly Arg Arg He Val Glu 
195 200 205 

Asp Lys Ser Pro Lys Tyr He Asn Ser Pro Asp Ser Arg Val Phe Lys 
210 215 220 

Lys Gly Glu Asn Leu Phe Gly Leu Tyr Glu Ala Lys Glu Tyr He Lys 
225 230 235 240 

Glu Glu Gly Phe Ala He Leu Val Glu Gly Tyr Phe Asp Leu Leu Arg 
245 250 255 

Leu Phe Ser Glu Gly He Arg Asn Val Val Ala Pro Leu Gly Thr Ala 
260 265 270 

Leu Thr Gin Asn Gin Ala Asn Leu Leu Ser Lys Phe Thr Lys Lys Val 
275 280 285 

Tyr He Leu Tyr Asp Gly Asp Asp Ala Gly Arg Lys Ala Met Lys Ser 
290 295 300 

Ala He Pro Leu Leu Leu Ser Ala Gly Val Glu Val Tyr Pro Val Tyr 
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305 



310 



315 



320 



Leu Pro Glu Gly Tyr Asp Pro Asp Glu Phe lie Lys Glu Phe Gly Lys 
325 330 335 

Glu Glu Leu Arg Arg Leu lie Asn Ser Ser Gly Glu Leu Phe Glu Thr 
340 345 350 

Leu lie Lys Thr Ala Arg Glu Asn Leu Glu Glu Lys Thr Arg Glu Phe 
355 360 365 

Arg Tyr Tyr Leu Gly Phe lie Ser Asp Gly Val Arg Arg Phe Ala Leu 
370 375 380 

Ala Ser Glu Phe His Thr Lys Tyr Lys Val Pro Met Glu lie Leu Leu 
385 390 395 400 

Met Lys lie Glu Lys Asn Ser Gin Glu Lys Glu lie Lys Leu Ser Phe 
405 410 415 

Lys Glu Lys lie Phe Leu Lys Gly Leu lie Glu Leu Lys Pro Lys lie 
420 425 430 

Asp Leu Glu Val Leu Asn Leu Ser Pro Glu Leu Lys Glu Leu Ala Val 
435 440 445 

Asn Ala Leu Asn Gly Glu Glu His Leu Leu Pro Lys Glu Val Leu Glu 
450 455 460 

Tyr Gin Val Asp Asn Leu Glu Lys Leu Phe Asn Asn lie Leu Arg Asp 
465 470 475 480 

Leu Gin Lys Ser Gly Lys Lys Arg Lys Lys Arg Gly Leu Lys Asn Val 
485 490 495 

Asn Thr 



<210> 135 
<211> 705 
<212> DNA 

<213> Aquifex aeolicus 
<400> 135 

atgcaagata ccgctacctg cagtatttgt caggggacgg gattcgtaaa gaccgaagac 60 
aacaaggtaa ggctctgcga atgcaggttc aagaaaaggg atgtaaacag ggaactaaac 12 0 
atcccaaaga ggtactggaa cgccaactta gacacttacc accccaagaa cgtatcccag 180 



105 



aacagggcac ttttgacgat aagggtcttc 
gggcttacct ttgtaggatc tcctggagtc 
aaagcgattt atgagaagaa gggaatcaga 
ttcaggttaa aacacttaat ggacgaggga 
aactcaccgg ttttggttct cgacgacctc 
gaactcatct cttacataat cacttacagg 
acgaattact cactccagag ggaagaagag 
agcagactcg gagaaaacgt agtttcaaaa 
aagggttccg acctcaggaa gtctaaaaag 



gtccacaact tcaatcccga ggaagggaaa 240 
ggcaaaactc accttgcggt tgcaacatta 3 00 
ggatacttct tcgatacgaa ggatctaata 3 60 
aaggatacaa agtttttaaa aactgtctta 42 0 
ggttctgaga ggctcagtga ctggcagagg 480 
tataacaacc ttaagagcac gataataacc 54 0 
agtagcgtga ggataagtgc ggatcttgca 600 
atttacgaga tgaacgagtt gctcgttata 660 
ctatcaaccc catct 7 05 



<210> 136 
<211> 235 
<212> PRT 

<213> Aquifex aeolicus 



<400> 136 

Met Gin Asp Thr Ala Thr Cys Ser He Cys Gin Gly Thr Gly Phe Val 



Lys Thr Glu Asp Asn Lys Val Arg Leu Cys Glu Cys Arg Phe Lys Lys 
20 25 30 

Arg Asp Val Asn Arg Glu Leu Asn He Pro Lys Arg Tyr Trp Asn Ala 
35 40 45 

Asn Leu Asp Thr Tyr His Pro Lys Asn Val Ser Gin Asn Arg Ala Leu 
50 55 60 

Leu Thr lie Arg Val Phe Val His Asn Phe Asn Pro Glu Glu Gly Lys 
65 70 75 80 

Gly Leu Thr Phe Val Gly Ser Pro Gly Val Gly Lys Thr His Leu Ala 



Val Ala Thr Leu Lys Ala He Tyr Glu Lys Lys Gly He Arg Gly Tyr 
100 105 HO 

Phe Phe Asp Thr Lys Asp Leu He Phe Arg Leu Lys His Leu Met Asp 
115 120 125 

Glu Gly Lys Asp Thr Lys Phe Leu Lys Thr Val Leu Asn Ser Pro Val 
130 135 140 

Leu Val Leu Asp Asp Leu Gly Ser Glu Arg Leu Ser Asp Trp Gin Arg 
145 150 155 160 

Glu Leu He Ser Tyr He He Thr Tyr Arg Tyr Asn Asn Leu Lys Ser 
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165 



170 



175 



Thr lie lie Thr Thr Asn Tyr Ser 
180 

Val Arg He Ser Ala Asp Leu Ala 
195 200 

Ser Lys He Tyr Glu Met Asn Glu 
210 215 

Leu Arg Lys Ser Lys Lys Leu Ser 
225 230 



Leu Gin Arg Glu Glu Glu Ser Ser 
185 190 

Ser Arg Leu Gly Glu Asn Val Val 
205 

Leu Leu Val He Lys Gly Ser Asp 
220 

Thr Pro Ser 
235 



<210> 137 
<211> 4101 
<212> DNA 

<213> Thermatoga maritima 
<400> 137 

atgaaaaaga ttgaaaattt gaagtggaaa 
cccgatgcag gtgtggttct cgtttccgtg 
gtgcgtttac tggagaagaa gacgcggttt 
aacggggatc taaggggaaa gatactttcc 
gatgttgttt tcgaaggaaa caggctgatt 
aggatcgcct ccaaactcag aagcacgaaa 
acagagatca tgctggaggt tgtggagcct 
caaccagaaa agagagaaga accaaagggt 
atctttggac agaaacccag aaagatcgtc 
aaaaagacat cggtgaaggg caagatcttc 
gtccttctga tttacctgac agacggagaa 
gttgaaaagg tcgaagggaa agtatcggtg 
cttctcgaaa acggggagcc caccctttac 
aaaaggatgg acaaatctcc ggttaagagg 
gatcaggacg caataacaga tgtgaacgaa 
cccgcgatag ccctcacgga tcatgggaac 
gcgaaagaag ctggaataaa gcccattttc 
gagcccgtca taaggaatct ctccgacgat 
ctcgacttcg agacgacggg tctcgacccg 
gtgaagatac agggtggcca gatagtggac 
gagatctcaa gaaaaagttc ggagatcacc 
agaagcatcg aggaagttct gccggagttc 
gcacacaacg ccaacttcga ctacagattt 
ttggactggg aaagacccta catagatacg 
agaagctact ctctggattc cgttgtggaa 
agggccctgg atgacgcgag ggtcaccgct 
aagaagatcg gtatcacgaa gctttcagaa 



aatgtctcgt ttaaaagcct ggaaatagat 60 
gaaaaattct ccgaagagat agaagacctt 12 0 
cgagtcatcg tgaacggtgt tcaaaaaagt 180 
cttctcaacg gtaatgtgcc ttacataaaa 240 
ctgaaagtgc ttggagattt cgcgcgggac 300 
aaacagctcg atgaactgct gcctcccgga 3 60 
ccggaagatc ttttgaaaaa ggaagtacca 420 
gaagaattga agatcgagga tgaaaaccac 4 80 
ttcaccccct caaaaatctt tgagtacaac 540 
aaaatagaga agatcgaggg gaaaagaacg 600 
gattctctga tctgcaaagt cttcaacgac 660 
ggagacgtga tcgttgccac aggagacctc 720 
gtgaagggaa tcacaaaact tcccgaagcg 780 
gtggagctcc acgcccatac caagttcagc 84 0 
tatgtgaaac gagccaagga atggggcttt 9 00 
gttcaggcca taccttactt ctacgacgcg 960 
ggtatcgaag cgtatctggt gagtgacgtg 10 2 0 
tcgacgtttg gagatgccac gttcgtcgtc 10 8 0 
caggtggatg agatcatcga gataggagcg 114 0 
gagtaccaca ctctcataaa gccttccagg 12 0 0 
ggaatcactc aagagatgct ggaaaacaag 12 6 0 
ctcggttttc tggaagattc catcatcgta 132 0 
ctgaggctgt ggatcaaaaa agtgatggga 13 80 
ctcgccctcg caaagtccct tctcaaactg 1440 
aagctcggat tgggtccctt ccggcaccac 150 0 
caggttttcc tcaggttcgt tgagatgatg 1560 
atggagaagt tgaaggatac gatagactac 162 0 
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accgcgttga aacccttcca ctgcacgatc ctcgttcaga acaaaaaggg attgaaaaac 1S80 
ctatacaaac tggtttctga ttcctatata aagtacttct acggtgttcc gaggatcctc 174 0 
aaaagtgagc tcatcgagaa cagagaagga ctgctcgtgg gtagcgcgtg tatctccggt 1800 
gagctcggac gtgccgccct cgaaggagcg agtgattcag aactcgaaga gatcgcgaag 1860 
ttctacgact acatagaagt catgccgctc gacgttatag ccgaagatga agaagaccta 1920 
gacagagaaa gactgaaaga agtgtaccga aaactctaca gaatagcgaa aaaattgaac 1980 
aagttcgtcg tcatgaccgg tgatgttcat ttcctcgatc ccgaagatgc caggggcaga 2 04 0 
gctgcacttc tggcacctca gggaaacaga aacttcgaga atcagcccgc actctacctc 2100 
agaacgaccg aagaaatgct cgagaaggcg atagagatat tcgaagatga agagatcgcg 21S0 
agggaagtcg tgatagagaa tcccaacaga atagccgata tgatcgagga agtgcagccg 22 2 0 
ctcgagaaaa aacttcaccc gccgatcata gagaacgccg atgaaatagt gagaaacctc 2280 
accatgaagc gggcgtacga gatctacggt gatccgcttc ccgaaatcgt ccagaagcgt 2 340 
gtggaaaagg aactgaacgc catcataaat catggatacg ccgttctcta tctcatcgct 24 00 
caggagctcg ttcagaaatc tatgagcgat ggttacgtgg ttggatccag aggatccgtc 24 60 
gggtcttcac tcgtggccaa tctcctcgga ataacagagg tgaatcccct accaccacat 252 0 
tacaggtgtc cagagtgcaa atactttgaa gttgtcgaag acgacagata cggagcgggt 2580 
tacgaccttc ccaacaagaa ctgtccaaga tgtggggctc ctctcagaaa agacggccac 2 64 0 
ggcataccgt ttgaaacgtt catggggttc gagggtgaca aggtccccga catagatctc 270 0 
aacttctcag gagagtatca ggaacgtgct catcgttttg tggaagaact cttcggtaaa 2760 
gaccacgtct atagggcggg aaccataaac accatcgcgg aaagaagtgc ggtgggttac 2 82 0 
gtgagaagct acgaagagaa aaccggaaag aagctcagaa aggcggaaat ggaaagactc 2 880 
gtttccatga tcacgggagt gaagagaacg acgggtcagc acccaggggg gctcatgatc 2 94 0 
ataccgaaag acaaagaagt ctacgatttc actcccatac agtatccagc caacgataga 3 00 0 
aacgcaggtg tgttcaccac gcacttcgca tacgagacga tccatgatga cctggtgaag 3060 
atagatgcgc tcggccacga tgatcccact ttcatcaaga tgctcaagga cctcaccgga 312 0 
atcgatccca tgacgattcc catggatgac cccgatacgc tcgccatatt cagttctgtg 3180 
aagcctcttg gtgtggatcc cgttgagctg gaaagcgatg tgggaacgta cggaattccg 3 24 0 
gagttcggaa ccgagtttgt gaggggaatg ctcgttgaaa cgagaccaaa gagtttcgcc 3 3 00 
gagcttgtga gaatctcagg actgtcacac ggtacggacg tctggttgaa caacgcacgt 3360 
gattggataa acctcggcta cgccaagctc tccgaggtta tctcgtgtag ggacgacatc 3420 
atgaacttcc tcatacacaa aggaatggaa ccgtcacttg ccttcaagat catggaaaac 3 4 80 
gtcaggaagg gaaagggtat cacagaagag atggagagcg agatgagaag gctgaaggtt 354 0 
ccagaatggt tcatcgaatc ctgtaaaagg atcaaatatc tcttcccgaa agctcacgct 3 6 00 
gtggcttacg tgagtatggc cttcagaatt gcttacttca aggttcacta tcctcttcag 3660 
ttttacgcgg cgtacttcac gataaaaggt gatcagttcg atccggttct cgtactcagg 37 2 0 
ggaaaagaag ccataaagag gcgcttgaga gaactcaaag cgatgcctgc caaagacgcc 37 8 0 
cagaagaaaa acgaagtgag tgttctggag gttgccctgg aaatgatact gagaggtttt 3 84 0 
tccttcctac cgcccgacat cttcaaatcc gacgcgaaga aatttctgat agaaggaaac 3900 
tcgctgagaa ttccgttcaa caaacttcca ggactgggtg acagcgttgc cgagtcgata 3960 
atcagagcca gggaagaaaa gccgttcact tcggtggaag atctcatgaa gaggaccaag 4 02 0 
gtcaacaaaa atcacataga gctgatgaaa agcctgggtg ttctcgggga ccttccagag 4080 
aoggaacagt tcacgctttt c 4101 



<210> 138 
<211> 1367 
<212> PRT 

<213> Thermatoga maritima 
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<400> 138 

Met Lys Lys lie 



Leu Glu lie Asp 
20 

Phe Ser Glu Glu 
35 

Arg Phe Arg Val 
50 

Arg Gly Lys lie 
65 

Asp Val Val Phe 



Phe Ala Arg Asp 
100 

Leu Asp Glu Leu 
115 

Glu Pro Pro Glu 
130 

Arg Glu Glu Pro 
145 

lie Phe Gly Gin 



Phe Glu Tyr Asn 
180 

Glu Lys lie Glu 
195 

Gly Glu Asp Ser 
210 

Glu Gly Lys Val 
225 

Leu Leu Glu Asn 



Glu Asn Leu Lys 
5 

Pro Asp Ala Gly 



lie Glu Asp Leu 
40 

lie Val Asn Gly 
55 

Leu Ser Leu Leu 
70 

Glu Gly Asn Arg 
85 

Arg lie Ala Ser 



Leu Pro Pro Gly 
12 0 

Asp Leu Leu Lys 
135 

Lys Gly Glu Glu 
150 

Lys Pro Arg Lys 
165 

Lys Lys Thr Ser 



Gly Lys Arg Thr 
200 

Leu lie Cys Lys 
215 

Ser Val Gly Asp 
230 

Gly Glu Pro Thr 



Trp Lys Asn Val 
10 

Val Val Leu Val 
25 

Val Arg Leu Leu 



Val Gin Lys Ser 
60 

Asn Gly Asn Val 
75 

Leu lie Leu Lys 
90 

Lys Leu Arg Ser 
105 

Thr Glu lie Met 



Lys Glu Val Pro 
140 

Leu Lys lie Glu 
155 

lie Val Phe Thr 
170 

Val Lys Gly Lys 
185 

Val Leu Leu lie 



Val Phe Asn Asp 
220 

Val lie Val Ala 
235 

Leu Tyr Val Lys 



Ser Phe Lys Ser 
15 

Ser Val Glu Lys 
30 

Glu Lys Lys Thr 
45 

Asn Gly Asp Leu 



Pro Tyr lie Lys 
80 

Val Leu Gly Asp 
95 

Thr Lys Lys Gin 
110 

Leu Glu Val Val 
125 

Gin Pro Glu Lys 



Asp Glu Asn His 
160 

Pro Ser Lys lie 
175 

lie Phe Lys lie 
190 

Tyr Leu Thr Asp 
205 

Val Glu Lys Val 



Thr Gly Asp Leu 
240 

Gly lie Thr Lys 
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245 



250 



255 



Leu Pro Glu Ala 
260 

Leu His Ala His 
275 

Asn Glu Tyr Val 
290 

Leu Thr Asp His 
305 

Ala Lys Glu Ala 



Val Ser Asp Val 
340 

Phe Gly Asp Ala 
355 

Asp Pro Gin Val 
370 

Gly Gly Gin lie 
385 

Glu lie Ser Arg 



Leu Glu Asn Lys 
420 

Phe Leu Glu Asp 
435 

Arg Phe Leu Arg 
450 

Arg Pro Tyr lie 
465 

Arg Ser Tyr Ser 



Phe Arg His His 



Lys Arg Met Asp 



Thr Lys Phe Ser 
280 

Lys Arg Ala Lys 
295 

Gly Asn Val Gin 
310 

Gly lie Lys Pro 
325 

Glu Pro Val lie 



Thr Phe Val Val 
360 

Asp Glu lie lie 
375 

Val Asp Glu Tyr 
390 

Lys Ser Ser Glu 
405 

Arg Ser lie Glu 



Ser lie lie Val 
440 

Leu Trp lie Lys 
455 

Asp Thr Leu Ala 
470 

Leu Asp Ser Val 
485 

Arg Ala Leu Asp 



Lys Ser Pro Val 
265 

Asp Gin Asp Ala 



Glu Trp Gly Phe 
300 

Ala He Pro Tyr 
315 

He Phe Gly He 
330 

Arg Asn Leu Ser 
345 

Leu Asp Phe Glu 



Glu lie Gly Ala 
380 

His Thr Leu lie 
395 

He Thr Gly He 
410 

Glu Val Leu Pro 
425 

Ala His Asn Ala 



Lys Val Met Gly 
460 

Leu Ala Lys Ser 
475 

Val Glu Lys Leu 
490 

Asp Ala Arg Val 



Lys Arg Val Glu 
270 

He Thr Asp Val 
285 

Pro Ala lie Ala 



Phe Tyr Asp Ala 
320 

Glu Ala Tyr Leu 
335 

Asp Asp Ser Thr 
350 

Thr Thr Gly Leu 
365 

Val Lys lie Gin 



Lys Pro Ser Arg 
400 

Thr Gin Glu Met 
415 

Glu Phe Leu Gly 
430 

Asn Phe Asp Tyr 
445 

Leu Asp Trp Glu 



Leu Leu Lys Leu 
480 

Gly Leu Gly Pro 
495 

Thr Ala Gin Val 
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500 



505 



510 



Phe Leu Arg Phe 

515 

Ser Glu Met Glu 
530 

Pro Phe His Cys 
545 

Leu Tyr Lys Leu 



Pro Arg lie Leu 
580 

Val Gly Ser Ala 
595 

Gly Ala Ser Asp 
610 

lie Glu Val Met 
625 

Asp Arg Glu Arg 



Lys Lys Leu Asn 
660 

Asp Pro Glu Asp 
675 

Asn Arg Asn Phe 
690 

Glu Met Leu Glu 
705 

Arg Glu Val Val 



Glu Val Gin Pro 
740 

Ala Asp Glu lie 



Val Glu Met Met 
520 

Lys Leu Lys Asp 
535 

Thr lie Leu Val 
550 

Val Ser Asp Ser 
565 

Lys Ser Glu Leu 



Cys lie Ser Gly 
600 

Ser Glu Leu Glu 
615 

Pro Leu Asp Val 
630 

Leu Lys Glu Val 
645 

Lys Phe Val Val 



Ala Arg Gly Arg 
680 

Glu Asn Gin Pro 
695 

Lys Ala He Glu 
710 

He Glu Asn Pro 
725 

Leu Glu Lys Lys 



Val Arg Asn Leu 



Lys Lys He Gly 



Thr lie Asp Tyr 
540 

Gin Asn Lys Lys 
555 

Tyr He Lys Tyr 
570 

He Glu Asn Arg 
585 

Glu Leu Gly Arg 



Glu He Ala Lys 
620 

He Ala Glu Asp 
635 

Tyr Arg Lys Leu 
650 

Met Thr Gly Asp 
665 

Ala Ala Leu Leu 



Ala Leu Tyr Leu 
700 

He Phe Glu Asp 
715 

Asn Arg He Ala 
730 

Leu His Pro Pro 
745 

Thr Met Lys Arg 



He Thr Lys Leu 
525 

Thr Ala Leu Lys 



Gly Leu Lys Asn 
560 

Phe Tyr Gly Val 
575 

Glu Gly Leu Leu 
590 

Ala Ala Leu Glu 
605 

Phe Tyr Asp Tyr 



Glu Glu Asp Leu 
640 

Tyr Arg He Ala 
655 

Val His Phe Leu 
670 

Ala Pro Gin Gly 
685 

Arg Thr Thr Glu 



Glu Glu He Ala 
720 

Asp Met He Glu 

735 

lie lie Glu Asn 
750 

Ala Tyr Glu He 
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755 



760 



7S5 



Tyr Gly Asp Pro Leu Pro Glu lie Val Gin Lys Arg Val Glu Lys Glu 
770 775 780 

Leu Asn Ala lie He Asn His Gly Tyr Ala Val Leu Tyr Leu He Ala 
785 790 795 800 

Gin Glu Leu Val Gin Lys Ser Met Ser Asp Gly Tyr Val Val Gly Ser 
805 810 815 

Arg Gly Ser Val Gly Ser Ser Leu Val Ala Asn Leu Leu Gly He Thr 
820 825 830 

Glu Val Asn Pro Leu Pro Pro His Tyr Arg Cys Pro Glu Cys Lys Tyr 
835 840 845 

Phe Glu Val Val Glu Asp Asp Arg Tyr Gly Ala Gly Tyr Asp Leu Pro 
850 855 860 

Asn Lys Asn Cys Pro Arg Cys Gly Ala Pro Leu Arg Lys Asp Gly His 
865 870 875 880 

Gly lie Pro Phe Glu Thr Phe Met Gly Phe Glu Gly Asp Lys Val Pro 
885 890 895 

Asp He Asp Leu Asn Phe Ser Gly Glu Tyr Gin Glu Arg Ala His Arg 
900 905 910 

Phe Val Glu Glu Leu Phe Gly Lys Asp His Val Tyr Arg Ala Gly Thr 
915 920 925 

He Asn Thr He Ala Glu Arg Ser Ala Val Gly Tyr Val Arg Ser Tyr 
930 935 940 

Glu Glu Lys Thr Gly Lys Lys Leu Arg Lys Ala Glu Met Glu Arg Leu 
945 950 955 960 

Val Ser Met He Thr Gly Val Lys Arg Thr Thr Gly Gin His Pro Gly 
965 970 975 

Gly Leu Met He He Pro Lys Asp Lys Glu Val Tyr Asp Phe Thr Pro 
980 985 990 

He Gin Tyr Pro Ala Asn Asp Arg Asn Ala Gly Val Phe Thr Thr His 
995 1000 1005 

Phe Ala Tyr Glu Thr He His Asp Asp Leu Val Lys He Asp Ala Leu 
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1010 



1015 



1020 



Gly His Asp Asp Pro Thr Phe lie Lys Met Leu Lys Asp Leu Thr Gly 
1025 1030 1035 1040 

lie Asp Pro Met Thr He Pro Met Asp Asp Pro Asp Thr Leu Ala He 
1045 1050 1055 

Phe Ser Ser Val Lys Pro Leu Gly Val Asp Pro Val Glu Leu Glu Ser 
1060 1065 1070 

Asp Val Gly Thr Tyr Gly He Pro Glu Phe Gly Thr Glu Phe Val Arg 
1075 1080 1085 

Gly Met Leu Val Glu Thr Arg Pro Lys Ser Phe Ala Glu Leu Val Arg 
1090 1095 HOO 

lie Ser Gly Leu Ser His Gly Thr Asp Val Trp Leu Asn Asn Ala Arg 
1105 HIO 1H5 H20 

Asp Trp He Asn Leu Gly Tyr Ala Lys Leu Ser Glu Val He Ser Cys 
1125 1130 1135 

Arg Asp Asp He Met Asn Phe Leu He His Lys Gly Met Glu Pro Ser 
1140 H45 H50 

Leu Ala Phe Lys He Met Glu Asn Val Arg Lys Gly Lys Gly He Thr 
1155 H60 H65 

Glu Glu Met Glu Ser Glu Met Arg Arg Leu Lys Val Pro Glu Trp Phe 
1170 H75 1180 

He Glu Ser Cys Lys Arg He Lys Tyr Leu Phe Pro Lys Ala His Ala 
1185 H90 H95 1200 

Val Ala Tyr Val Ser Met Ala Phe Arg He Ala Tyr Phe Lys Val His 
1205 1210 1215 

Tyr Pro Leu Gin Phe Tyr Ala Ala Tyr Phe Thr He Lys Gly Asp Gin 
1220 1225 1230 

Phe Asp Pro Val Leu Val Leu Arg Gly Lys Glu Ala He Lys Arg Arg 
1235 1240 1245 

Leu Arg Glu Leu Lys Ala Met Pro Ala Lys Asp Ala Gin Lys Lys Asn 
1250 1255 1260 

Glu Val Ser Val Leu Glu Val Ala Leu Glu Met He Leu Arg Gly Phe 
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1265 



1270 



1275 



1280 



Ser Phe Leu Pro Pro Asp He Phe Lys Ser Asp Ala Lys Lys Phe Leu 
1285 1290 1295 

He Glu Gly Asn Ser Leu Arg He Pro Phe Asn Lys Leu Pro Gly Leu 
1300 1305 1310 

Gly Asp Ser Val Ala Glu Ser He He Arg Ala Arg Glu Glu Lys Pro 
1315 1320 1325 

Phe Thr Ser Val Glu Asp Leu Met Lys Arg Thr Lys Val Asn Lys Asn 
1330 1335 1340 

His He Glu Leu Met Lys Ser Leu Gly Val Leu Gly Asp Leu Pro Glu 
1345 1350 1355 1360 

Thr Glu Gin Phe Thr Leu Phe 
1365 



<210> 139 
<211> 567 
<212> DNA 

<213> Thermatoga maritima 
<400> 139 

gtgctcgcca tgatatggaa cgacaccgtt 
accgatccct ttgccggaga ccggatagtt 
aagatctaca gaaacaaagc gtttcactct 
ctgattcaga aagttcacgg tatcagcaac 
acagtttacg atcttttcag ggattacgtg 
aacttcgacc tcacttttct ggatatgatg 
aatccctaca tcgacacact cgatctttca 
aaatggctct ccgaaagact tggaataaaa 
gccctggtga ccgcaagagt ttttgtgaag 
aacgaattca tacgtggaaa acggggg 



ttttgcgtcg tagacacaga aaccacggga 6 0 
gaaatagccg ctgttcctgt cttcaagggg 120 
ctcgtgaatc ccagaataag aatccctgcg 180 
atggacatcg tggaagcgcc agacatggac 24 0 
aagggaacgg tgctcgtgtt tcacaacgcc 300 
gcaaaggaaa cgggaaactt tccaataacg 3 60 
gaagagatct ttggaaggcc tcattctctc 42 0 
accacgatac ggcaccgtgc tcttccagat 480 
cttgttgaat ttcttggtga aaacagggtc 54 0 
567 



<210> 140 
<211> 189 
<212> PRT 

<213> Thermatoga maritima 
<400> 140 

Met Leu Ala Met He Trp Asn Asp Thr Val Phe Cys Val Val Asp Thr 
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Glu Thr Thr Gly Thr Asp Pro Phe Ala Gly Asp Arg He Val Glu He 
20 25 30 



Ala Ala Val Pro Val Phe Lys Gly 
35 40 

His Ser Leu Val Asn Pro Arg lie 
50 55 

Val His Gly He Ser Asn Met Asp 
65 70 

Thr Val Tyr Asp Leu Phe Arg Asp 
85 

Phe His Asn Ala Asn Phe Asp Leu 
100 

Glu Thr Gly Asn Phe Pro He Thr 
115 120 

Leu Ser Glu Glu lie Phe Gly Arg 
130 135 

Glu Arg Leu Gly He Lys Thr Thr 
145 150 

Ala Leu Val Thr Ala Arg Val Phe 
165 

Glu Asn Arg Val Asn Glu Phe lie 
180 



Lys He Tyr Arg Asn Lys Ala Phe 
45 

Arg lie Pro Ala Leu He Gin Lys 
60 

He Val Glu Ala Pro Asp Met Asp 
75 80 

Tyr Val Lys Gly Thr Val Leu Val 
90 95 

Thr Phe Leu Asp Met Met Ala Lys 
105 HO 

Asn Pro Tyr He Asp Thr Leu Asp 
125 

Pro His Ser Leu Lys Trp Leu Ser 
14 0 

He Arg His Arg Ala Leu Pro Asp 
155 160 

Val Lys Leu Val Glu Phe Leu Gly 
170 175 

Arg Gly Lys Arg Gly 
185 



<210> 141 
<211> 1434 
<212> DNA 

<213> Thermatoga maritima 
<400> 141 

gtggaagttc tttacaggaa gtacaggcca 
catgtgaaga aggcaataat cggtgctatt 
ttcgccggtc cgaggggaac ggggaagact 
aactgtgaga acagaaaggg agttgaaccc 
gacgagggaa ccttcatgga cgtgatagag 
gagatcagaa gaatcagaga cgccgttgga 
tacataatag acgaagttca catgctcacg 
ctcgaagaac ctccttccca cgtcgtgttc 



aagacttttt ctgaggttgt caatcaggat 60 
cagaagaaca gcgtggccca cggatacata 120 
actcttgcca gaattctcgc aaaatccctg 180 
tgcaattcct gcagagcctg cagagagata 240 
ctcgacgcgg cctccaacag aggaatagac 3 00 
tacaggccga tggaaggtaa atacaaagtc 3 60 
aaagaagcct tcaacgcgct cctcaaaaca 42 0 
gtgctggcaa cgacaaacct tgagaaggtt 4 80 
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cctcccacga ttatctcgag atgtcaggtt 
atcgaaaaga ggctccagga agttgcggag 
ctgagcttca tcgcaaaaag agcctctgga 
caggtgtgga agttctcgga aggaaagata 
ttgataccga tacaggttgt tcgcgattac 
agggtcttca ccgttctcga cgacgtctat 
caggaagcag tcgaggatct ggtcgaagac 
tcagcgaacg atatagttca ggtttcgaga 
ttcgccgaag aaaaacgact cgtctgtaaa 
tccaccacaa acgttcagga aaacgatgtc 
cagaaagaag agaagaaaga aacggtgaag 
ttcgagaaac gcttcaaaga actcatggaa 
tttgtcgctc tcagcctctc agaggtgcag 
gattcatcga aagctatgca ttacgagttg 
attttttcta gaaaactcgg gaaaaaagta 
gaaacaatcg agaaggtttc tcagaagatc 



ttcgagttca gaaacattcc cgacgagctc 540 
gctgaaggaa tagagataga cagggaagct 600 
ggcttgagag acgcgctcac catgctcgag 660 
gatctcgaga cggtacacag ggcgctcggg 720 
gtgaacgcta tcttttctgg tgatgtgaaa 7 80 
tacagcggga aggactacga ggtgctcatt 840 
ctggaaaggg agagaggggt ttaccaggtt 900 
caacttctga atcttctgag agagataaag 960 
gtgggttcgg cttacatagc gacgaggttc 1020 
agagaaaaaa acgataattc aaatgtacag 1080 
gcaaaagaag aaaaacagga agacagcgag 114 0 
gaactgaaag aaaagggcga tctctctatc 12 00 
tttgacggag aaaaggtgat tatttctttt 12 60 
atgaagaaaa aactgcctga gctggaaaac 13 2 0 
gaagttgaac ttcgactgat gggaaaagaa 13 80 
ctgagattgt ttgaacagga ggga 1434 



<210> 142 
<211> 478 
<212> PRT 

<213> Thermatoga maritima 



<400> 142 

Met Glu Val Leu Tyr Arg Lys Tyr 
1 5 

Val Asn Gin Asp His Val Lys Lys 
20 

Asn Ser Val Ala His Gly Tyr He 
35 40 

Lys Thr Thr Leu Ala Arg He Leu 
50 55 

Arg Lys Gly Val Glu Pro Cys Asn 
65 70 

Asp Glu Gly Thr Phe Met Asp Val 
85 

Arg Gly lie Asp Glu lie Arg Arg 
100 

Pro Met Glu Gly Lys Tyr Lys Val 
115 120 



Arg Pro Lys Thr Phe Ser Glu Val 
10 15 

Ala lie lie Gly Ala lie Gin Lys 
25 30 

Phe Ala Gly Pro Arg Gly Thr Gly 
45 

Ala Lys Ser Leu Asn Cys Glu Asn 
60 

Ser Cys Arg Ala Cys Arg Glu He 
75 80 

He Glu Leu Asp Ala Ala Ser Asn 
90 95 

He Arg Asp Ala Val Gly Tyr Arg 
105 HO 

Tyr He He Asp Glu Val His Met 
125 
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Leu Thr Lys Glu Ala 
130 

Pro Ser His Val Val 
145 

Pro Pro Thr lie lie 
165 

Pro Asp Glu Leu lie 
180 

Gly He Glu He Asp 
195 

Ser Gly Gly Leu Arg 
210 

Phe Ser Glu Gly Lys 
225 

Leu He Pro He Gin 
245 

Gly Asp Val Lys Arg 
260 

Gly Lys Asp Tyr Glu 
275 

Glu Asp Leu Glu Arg 
290 

He Val Gin Val Ser 
305 

Phe Ala Glu Glu Lys 
325 

Ala Thr Arg Phe Ser 
340 

Lys Asn Asp Asn Ser 
355 

Val Lys Ala Lys Glu 
370 



Phe Asn Ala Leu Leu Lys 
135 

Phe Val Leu Ala Thr Thr 
150 155 

Ser Arg Cys Gin Val Phe 
170 

Glu Lys Arg Leu Gin Glu 
185 

Arg Glu Ala Leu Ser Phe 
200 

Asp Ala Leu Thr Met Leu 
215 

He Asp Leu Glu Thr Val 
230 235 

Val Val Arg Asp Tyr Val 
250 

Val Phe Thr Val Leu Asp 
265 

Val Leu He Gin Glu Ala 
280 

Glu Arg Gly Val Tyr Gin 
295 

Arg Gin Leu Leu Asn Leu 
310 315 

Arg Leu Val Cys Lys Val 
330 

Thr Thr Asn Val Gin Glu 
345 

Asn Val Gin Gin Lys Glu 
360 

Glu Lys Gin Glu Asp Ser 
375 



Thr Leu Glu Glu Pro 
140 

Asn Leu Glu Lys Val 
160 

Glu Phe Arg Asn He 
175 

Val Ala Glu Ala Glu 
190 

He Ala Lys Arg Ala 
205 

Glu Gin Val Trp Lys 
220 

His Arg Ala Leu Gly 
240 

Asn Ala He Phe Ser 
255 

Asp Val Tyr Tyr Ser 
270 

Val Glu Asp Leu Val 
285 

Val Ser Ala Asn Asp 
300 

Leu Arg Glu He Lys 
320 

Gly Ser Ala Tyr He 
335 

Asn Asp Val Arg Glu 
350 

Glu Lys Lys Glu Thr 
365 

Glu Phe Glu Lys Arg 
380 
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Phe Lys Glu Leu Met Glu Glu Leu Lys Glu Lys Gly Asp Leu Ser lie 
385 390 395 400 



Phe Val Ala Leu Ser 
405 

lie He Ser Phe Asp 
420 

Lys Lys Leu Pro Glu 
435 

Lys Val Glu Val Glu 
450 

Lys Val Ser Gin Lys 
465 



Leu Ser Glu Val Gin Phe 
410 

Ser Ser Lys Ala Met His 
425 

Leu Glu Asn He Phe Ser 
440 

Leu Arg Leu Met Gly Lys 
455 

lie Leu Arg Leu Phe Glu 
470 475 



Asp Gly Glu Lys Val 
415 

Tyr Glu Leu Met Lys 
430 

Arg Lys Leu Gly Lys 
445 

Glu Glu Thr He Glu 
460 

Gin Glu Gly 



<210> 143 
<211> 1098 
<212> DNA 

<213> Thermatoga maritima 
<400> 143 

atgaaagtaa ccgtcacgac tcttgaattg 
ctcgcaaaga aatccgtgaa acccattctt 
aatttctaca tctgcgcgac cgatctcgag 
gaaatctccg gtgaggcacg ttttgtggta 
gttctcccag atgagataac ggaactttct 
ggaagcaccg ttttcaggat caccaccatg 
gccgagtctg gaataacctt cgaagttgac 
gtcatcttcg ccgctgccaa agacgagttc 
ctccacaaga atcttctcag gctggttgca 
gagcagatag aaaacgagga agaggcgagt 
gttcaaaacg tgctggacaa cacaacggag 
agggtttctc tgtcgacaaa tgatgtagaa 
cccgattaca aaagggtgat ccccgaaact 
gaactcaggg aatctttgaa gagggtgatg 
aagttcgaaa tagaagaaaa cgttatgaga 
gtggtcgatg aagttgaagt tcaaaaagaa 
aagttcatcg aggacgtttt gaagcacatt 
gattctacca gtccatgtca gataaatcca 
atgcccatca gactggca 



aaagacaaaa taaccatcgc ctcaaaagcg 60 
gctggatttc ttttcgaagt gaaagatgga 120 
accggagtca aagcaaccgt gaatgccgct 180 
ccaggagatg tcattcagaa gatggtcaag 240 
ttagaggggg atgctcttgt tataagttct 300 
cccgcggacg aatttccaga gataacgcct 3 60 
acttcgctcc tcgaggaaat ggttgaaaag 42 0 
atgcgaaatc tgaatggagt tttctgggaa 480 
agtgatggtt tcagacttgc acttgctgaa 54 0 
ttcttgctct ctttgaagag catgaaagaa 600 
ccgactataa cggtgaggta cgatggaaga 66 0 
acggtgatga gagtggtcga cgctgaattt 72 0 
ttcaaaacga aagtggtggt ttccagaaaa 780 
gtgattgcca gcaagggaag cgagtccgtg 84 0 
cttgtgagca agagcccgga ttatggagaa 900 
ggggaagatc tcgtgatcgc tttcaacccg 960 
gagactgaag aaatcgaaat gaacttcgtt 102 0 
ctcgatattt otggatacct ttacatagtg 1080 
1098 



<210> 144 
<211> 366 
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<212> PRT 

<213> Thermatoga maritima 



<400> 144 

Met Lys Val Thr Val Thr Thr Leu Glu Leu Lys Asp Lys lie Thr He 
15 10 15 

Ala Ser Lys Ala Leu Ala Lys Lys Ser Val Lys Pro He Leu Ala Gly 
20 25 30 

Phe Leu Phe Glu Val Lys Asp Gly Asn Phe Tyr He Cys Ala Thr Asp 
35 40 45 

Leu Glu Thr Gly Val Lys Ala Thr Val Asn Ala Ala Glu He Ser Gly 
50 55 60 

Glu Ala Arg Phe Val Val Pro Gly Asp Val He Gin Lys Met Val Lys 
65 70 75 80 

Val Leu Pro Asp Glu He Thr Glu Leu Ser Leu Glu Gly Asp Ala Leu 
85 90 95 

Val He Ser Ser Gly Ser Thr Val Phe Arg He Thr Thr Met Pro Ala 
100 105 HO 

Asp Glu Phe Pro Glu He Thr Pro Ala Glu Ser Gly He Thr Phe Glu 
115 120 125 

Val Asp Thr Ser Leu Leu Glu Glu Met Val Glu Lys Val lie Phe Ala 
130 135 140 

Ala Ala Lys Asp Glu Phe Met Arg Asn Leu Asn Gly Val Phe Trp Glu 
145 150 155 160 

Leu His Lys Asn Leu Leu Arg Leu Val Ala Ser Asp Gly Phe Arg Leu 
165 170 175 

Ala Leu Ala Glu Glu Gin He Glu Asn Glu Glu Glu Ala Ser Phe Leu 
180 185 190 

Leu Ser Leu Lys Ser Met Lys Glu Val Gin Asn Val Leu Asp Asn Thr 
195 200 205 

Thr Glu Pro Thr He Thr Val Arg Tyr Asp Gly Arg Arg Val Ser Leu 
210 215 220 

Ser Thr Asn Asp Val Glu Thr Val Met Arg Val Val Asp Ala Glu Phe 
225 230 235 240 
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Pro Asp Tyr Lys 



Val Ser Arg Lys 
260 

Ala Ser Lys Gly 
275 

Met Arg Leu Val 
290 

Val Glu Val Gin 
305 

Lys Phe lie Glu 



Met Asn Phe Val 
340 

lie Ser Gly Tyr 
355 



Arg Val He Pro 
245 

Glu Leu Arg Glu 



Ser Glu Ser Val 
280 

Ser Lys Ser Pro 
295 

Lys Glu Gly Glu 
310 

Asp Val Leu Lys 
325 

Asp Ser Thr Ser 



Leu Tyr He Val 
360 



Glu Thr Phe Lys 
250 

Ser Leu Lys Arg 
265 

Lys Phe Glu He 



Asp Tyr Gly Glu 
300 

Asp Leu Val He 
315 

His He Glu Thr 
330 

Pro Cys Gin He 
345 

Met Pro He Arg 



Thr Lys Val Val 
255 

Val Met Val He 
270 

Glu Glu Asn Val 
285 

Val Val Asp Glu 



Ala Phe Asn Pro 
320 

Glu Glu He Glu 
335 

Asn Pro Leu Asp 
350 

Leu Ala 
365 



<210> 145 
<211> 972 
<212> DNA 

<213> Thermatoga maritima 
<400> 145 

atgccagtca cgtttctcac aggtactgca 
ctcctgaagg atggtaacgt ggagtacata 
gatttcataa ggtctttact caggacaaag 
atcgtcaatt tcgatgagtg gaaagcacag 
aacgtaccgg aagacgttca tatcttcatc 
gcgctggagc ttccgaagcc atgggaaacg 
ttcagggaga atggtttgct catcgataaa 
ggaacgaacg acctgatcat agaaagggag 
agaaagataa cggtagaaga cgtggaagag 
gatgattttt gctttgctgt ttccgaagga 
cagctgtgga aaaccacaga gtccgtggtg 
gatctcttca aaatcctcgt tcttgtgaca 
tccagggtgt ccaaagagct gggaattccc 
tcctttaaga cctggaaatt caaggtgatg 
gttagaaaga tactgaggga tctctacgat 
ccaaaaccgt tcttccacga gttcatagaa 



gaaactcaga aggaagaatt gataaagaaa 60 
aggatccatc cggaggatcc cgacaagatc 12 0 
acgatctttt ccaacaagac gatcattgac 180 
gagcagaagc gtctcgttga acttttgaaa 24 0 
cgttctcaaa aaacaggtgg aaagggagta 300 
gacaagtggc ttgagtggat agaaaagcgc 3 60 
gatgcccttc agctgttttt ctccaaggtt 420 
attgaaaaac tgaaagctta ttccgaggac 4 80 
gtcgttttta cctatcagac tccgggatac 540 
aaaaggaagc tcgctcactc tcttctgtcg 600 
attgccactg tccttgcgaa tcacttcttg 660 
aagaaaagat actacacctg gcctgatgtg 720 
gttcctcgtg tggctcgttt cctcggtttc 7 80 
aaccacctcc tctactacga tgtgaagaag 840 
ctggacagag ccgtgaaaag cgaagaagat 900 
gaggtggcac tggatgtata ttctcttcag 960 



12 0 



agagatgaag aa 



972 



<210> 146 
<211> 324 
<212> PRT 

<213> Thermatoga maritima 
<400> 146 

Met Pro Val Thr Phe Leu Thr Gly Thr Ala Glu Thr Gin Lys Glu Glu 
15 10 15 

Leu He Lys Lys Leu Leu Lys Asp Gly Asn Val Glu Tyr He Arg He 
20 25 30 

His Pro Glu Asp Pro Asp Lys He Asp Phe He Arg Ser Leu Leu Arg 
35 40 45 

Thr Lys Thr He Phe Ser Asn Lys Thr He He Asp He Val Asn Phe 
50 55 60 

Asp Glu Trp Lys Ala Gin Glu Gin Lys Arg Leu Val Glu Leu Leu Lys 
65 70 75 80 

Asn Val Pro Glu Asp Val His He Phe He Arg Ser Gin Lys Thr Gly 
85 90 95 

Gly Lys Gly Val Ala Leu Glu Leu Pro Lys Pro Trp Glu Thr Asp Lys 
100 105 HO 

Trp Leu Glu Trp He Glu Lys Arg Phe Arg Glu Asn Gly Leu Leu He 
115 120 125 

Asp Lys Asp Ala Leu Gin Leu Phe Phe Ser Lys Val Gly Thr Asn Asp 
130 135 140 

Leu He He Glu Arg Glu He Glu Lys Leu Lys Ala Tyr Ser Glu Asp 
145 150 155 160 

Arg Lys He Thr Val Glu Asp Val Glu Glu Val Val Phe Thr Tyr Gin 
165 170 175 

Thr Pro Gly Tyr Asp Asp Phe Cys Phe Ala Val Ser Glu Gly Lys Arg 
180 185 190 

Lys Leu Ala His Ser Leu Leu Ser Gin Leu Trp Lys Thr Thr Glu Ser 
195 200 205 
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Val Val He Ala Thr Val Leu Ala Asn His Phe Leu Asp Leu Phe Lys 
210 215 220 



He Leu Val Leu Val Thr Lys Lys 
225 230 

Ser Arg Val Ser Lys Glu Leu Gly 
245 

Phe Leu Gly Phe Ser Phe Lys Thr 
260 

Leu Leu Tyr Tyr Asp Val Lys Lys 
275 280 

Tyr Asp Leu Asp Arg Ala Val Lys 
290 295 

Phe His Glu Phe lie Glu Glu Val 
305 310 

Arg Asp Glu Glu 



Arg Tyr Tyr Thr Trp Pro Asp Val 
235 240 

He Pro Val Pro Arg Val Ala Arg 
250 255 

Trp Lys Phe Lys Val Met Asn His 
265 270 

Val Arg Lys He Leu Arg Asp Leu 
285 

Ser Glu Glu Asp Pro Lys Pro Phe 
300 

Ala Leu Asp Val Tyr Ser Leu Gin 
315 320 



<210> 147 
<211> 936 
<212> DNA 

<213> Thermatoga maritima 
<400> 147 

atgaacgatt tgatcagaaa gtacgctaaa 
gaaaagtctg aaggaatatc catcctcata 
gtatcccttg aacttcccga gtacgtggag 
gagatagatc ccgaggggga gaacataggc 
ctgaactaca gccccgagct ctacacgaga 
atgacccagc aggcggcgaa cgcgtttctg 
gtgatcgttc tgaacactcg ccgctggcat 
ttcagagtgg ttgtgaacgt tccaaaggag 
gatctctggg aggaacttcc acttcttgag 
aaacttggtg cggaaaaact ttctggattg 
aaactcttga aaaaggtcct ttcaaaaggc 
ctggagagat tttcaaaggt ggaatcgaag 
aacacgataa caggaaaaga cgcgtttctt 
cacgaaaaca catgggaaag cgttgaagat 
ctcagggtga agatagcgaa tctgaacaac 
cacagagaga gaaagagagg tgtcaacgct 



gatcaactgg aaactttgaa aaggatcata 60 
aatggagaag atctctcgta tccgagagaa 12 0 
aaatttcccc cgaaggcctc ggatgttctg 18 0 
atagacgaca tcagaacgat aaaggacttc 24 0 
aagtacgtga tagtccacga ctgtgaaaga 30 0 
aaggcccttg aagaaccacc agaatacgct 360 
tatctactgc cgacgataaa gagccgagtg 42 0 
ttcagagatc tcgtgaaaga gaaaatagga 480 
agagacttca aaacggctct cgaagcctac 540 
atggaaagtc tcaaagtttt ggagacggaa 600 
ctcgaaggtt atctcgcatg tagggagctc 66 0 
gaattctttg cgctttttga tcaggtgact 720 
ttgatccaga gactgacaag aatcattctc 7 80 
caaaaaagcg tgtctttcct cgattcaatt 840 
aaactcactc fcgatgaacat cctcgcgata 900 
tggagc 9 3 6 



122 



<210> 148 
<211> 311 
<212> PRT 

<213> Thermatoga maritima 
<400> 148 

Met Asn Asp Leu He Arg Lys Tyr Ala Lys Asp Gin Leu Glu Thr Leu 



Lys Arg He He Glu Lys Ser Glu Gly He Ser He Leu He Asn Gly 
20 25 30 

Glu Asp Leu Ser Tyr Pro Arg Glu Val Ser Leu Glu Leu Pro Glu Tyr 
35 40 45 

Val Glu Lys Phe Pro Pro Lys Ala Ser Asp Val Leu Glu lie Asp Pro 
50 55 60 

Glu Gly Glu Asn He Gly He Asp Asp He Arg Thr He Lys Asp Phe 
65 70 75 80 

Leu Asn Tyr Ser Pro Glu Leu Tyr Thr Arg Lys Tyr Val He Val His 



Asp Cys Glu Arg Met Thr Gin Gin Ala Ala Asn Ala Phe Leu Lys Ala 
100 105 HO 

Leu Glu Glu Pro Pro Glu Tyr Ala Val lie Val Leu Asn Thr Arg Arg 
115 120 125 

Trp His Tyr Leu Leu Pro Thr He Lys Ser Arg Val Phe Arg Val Val 
130 135 140 

Val Asn Val Pro Lys Glu Phe Arg Asp Leu Val Lys Glu Lys He Gly 
145 150 155 160 

Asp Leu Trp Glu Glu Leu Pro Leu Leu Glu Arg Asp Phe Lys Thr Ala 
165 170 175 

Leu Glu Ala Tyr Lys Leu Gly Ala Glu Lys Leu Ser Gly Leu Met Glu 
180 185 190 

Ser Leu Lys Val Leu Glu Thr Glu Lys Leu Leu Lys Lys Val Leu Ser 
195 200 205 

Lys Gly Leu Glu Gly Tyr Leu Ala Cys Arg Glu Leu Leu Glu Arg Phe 
210 215 220 
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Ser Lys Val Glu Ser Lys Glu Phe Phe Ala Leu Phe Asp Gin Val Thr 
225 230 235 240 



Asn Thr lie Thr Gly Lys Asp Ala 
245 

Arg lie lie Leu His Glu Asn Thr 
260 

Val Ser Phe Leu Asp Ser lie Leu 

275 280 

Asn Lys Leu Thr Leu Met Asn He 
290 295 

Arg Gly Val Asn Ala Trp Ser 
305 310 



Phe Leu Leu He Gin Arg Leu Thr 
250 255 

Trp Glu Ser Val Glu Asp Lys Ser 
265 270 

Arg Val Lys He Ala Asn Leu Asn 
285 

Leu Ala lie His Arg Glu Arg Lys 
300 



<210> 149 
<211> 423 
<212> DNA 

<213> Thermatoga maritima 
<400> 149 

atgtctttct tcaacaagat cafcactcata ggaagactcg tgagagatcc cgaagagaga 60 

tacacgctca gcggaactcc agtcaccacc ttcaccatag cggtggacag ggttcccaga 12 0 

aagaacgcgc cggacgacgc tcaaacgact gatttcttca ggatcgtcac ctttggaaga 18 0 

ctggcagagt tcgctagaac ctatctcacc aaaggaaggc tcgttctcgt cgaaggtgaa 240 

atgagaatga gaagatggga aacacccact ggagaaaaga gggtatctcc ggaggttgtc 300 

gcaaacgttg ttagattcat ggacagaaaa cctgctgaaa cagttagcga gactgaagag 3 60 

gagctggaaa taccggaaga agacttttcc agcgatacct tcagtgaaga tgaaccacca 42 0 



<210> 150 
<211> 141 
<212> PRT 

<213> Thermatoga maritima 
<400> 150 

Met Ser Phe Phe Asn Lys He He Leu He Gly Arg Leu Val Arg Asp 
15 10 15 

Pro Glu Glu Arg Tyr Thr Leu Ser Gly Thr Pro Val Thr Thr Phe Thr 
20 25 30 
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lie Ala Val Asp Arg Val Pro Arg Lys Asn Ala Pro Asp Asp Ala Gin 
35 40 45 



Thr Thr Asp Phe Phe Arg lie Val Thr Phe Gly Arg Leu Ala Glu Phe 
50 55 60 

Ala Arg Thr Tyr Leu Thr Lys Gly Arg Leu Val Leu Val Glu Gly Glu 
65 70 75 80 

Met Arg Met Arg Arg Trp Glu Thr Pro Thr Gly Glu Lys Arg Val Ser 
85 90 95 

Pro Glu Val Val Ala Asn Val Val Arg Phe Met Asp Arg Lys Pro Ala 
100 105 110 

Glu Thr Val Ser Glu Thr Glu Glu Glu Leu Glu lie Pro Glu Glu Asp 
115 120 125 

Phe Ser Ser Asp Thr Phe Ser Glu Asp Glu Pro Pro Phe 
130 135 140 



<210> 151 
<211> 1353 
<212> DNA 

<213> Thermatoga maritima 
<400> 151 

atgcgtgttc ccccgcacaa cttagaggcc 
gatccgtcgg taataaacga cgttcttgaa 
aaacaccaac acatcttcag agcgatggaa 
gtggtttccg tctgtgacaa gcttcaaagc 
ctggaagtgg cccagctcgc tgaggctgtg 
gagatcgtca aggaaaaatc cattctgagg 
gaaagtgcct acatggaaga agatgtggag 
ttcgagatct cagagatgaa aacgacaaaa 
cgggtgtttg aaaacctgga gaacttcagg 
ctcataacgg gactaccaac gggattcaaa 
agctccgatc tggtgataat agcagcgaga 
tcaatagcga ggaacatggc tgtcaatttc 
atgtccaagg aacagctcgc tcaaagacta 
agcatcagaa caggatacct ggatcaggag 
aaactctaca aagcacccat agttgtggac 
agggcaaaag cgagaaggat gaaaaaagaa 
ctccagctca tgcacctgaa aggaagaaaa 
tcgagatctc tgaagctcct tgcgagggaa 
ctttcgaggg ccgtagaaca gagagaagac 
tccggtgcga tagaacagga cgcagacaca 



gaagttgctg tgctcggaag catattgata 60 
attttgagcc acgaagattt ctatctgaaa 12 0 
gagctttacg acgaaggaaa accggtggac 18 0 
atgggaaaac tcgaggaagt aggtggagat 24 0 
cccagttctg cacacgcact tcactacgcg 300 
aaactcattg agatctccag aaaaatctca 360 
atcctgctcg acaacgcaga aaagatgatc 42 0 
tcctacgatc atctgagagg catcatgcac 480 
gaaagagcca accttataga acccggtgtg 54 0 
agtctggaca aacagaccac agggttccac 6 00 
ccctccatgg gaaaaacctc cttcgcactc 660 
gaaatccccg tcggaatatt cagtctcgag 720 
ctcagcatgg agtccggtgt ggatctttac 7 80 
aagtgggaaa gactcacaat agcggcttct 840 
gatgagtcac tcctcgatcc gcgatcgttg 900 
tacgatgtaa aagccatttt tgtcgactat 960 
gaaagcagac agcaggagat atccgagatc 1020 
ctcgacatag tggtgatagc gctttcacag 1080 
aaaagaccga ggctgagtga cctcagggaa 1140 
gtcatcttca tctacaggga ggaatattac 12 00 
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aggagcaaaa aatccaaaga ggaaagcaag cttcacgaac ctcacgaagc tgaaatcata 12 6 0 
ataggtaaac agagaaacgg tcccgttgga acgatcactc tgatcttcga ccccagaacg 13 2 0 
gttacgttcc atgaagtcga tgtggtgcat tea 1353 



<210> 152 
<211> 451 
<212> PRT 

<213> Thermatoga maritima 
<400> 152 

Met Arg Val Pro Pro His Asn Leu Glu Ala Glu Val Ala Val Leu Gly 



Ser lie Leu He Asp Pro Ser Val He Asn Asp Val Leu Glu He Leu 
20 25 30 

Ser His Glu Asp Phe Tyr Leu Lys Lys His Gin His He Phe Arg Ala 
35 40 45 

Met Glu Glu Leu Tyr Asp Glu Gly Lys Pro Val Asp Val Val Ser Val 
50 55 SO 

Cys Asp Lys Leu Gin Ser Met Gly Lys Leu Glu Glu Val Gly Gly Asp 
65 70 75 80 

Leu Glu Val Ala Gin Leu Ala Glu Ala Val Pro Ser Ser Ala His Ala 



Leu His Tyr Ala Glu He Val Lys Glu Lys Ser He Leu Arg Lys Leu 
100 105 HO 

He Glu He Ser Arg Lys He Ser Glu Ser Ala Tyr Met Glu Glu Asp 
115 120 125 

Val Glu He Leu Leu Asp Asn Ala Glu Lys Met He Phe Glu He Ser 
130 135 140 

Glu Met Lys Thr Thr Lys Ser Tyr Asp His Leu Arg Gly He Met His 
145 150 155 160 

Arg Val Phe Glu Asn Leu Glu Asn Phe Arg Glu Arg Ala Asn Leu lie 
165 170 175 

Glu Pro Gly Val Leu He Thr Gly Leu Pro Thr Gly Phe Lys Ser Leu 
180 185 190 

Asp Lys Gin Thr Thr Gly Phe His Ser Ser Asp Leu Val He He Ala 



12 6 



195 



200 



205 



Ala Arg Pro Ser Met Gly Lys Thr Ser Phe Ala Leu Ser lie Ala Arg 
210 215 220 

Asn Met Ala Val Asn Phe Glu lie Pro Val Gly lie Phe Ser Leu Glu 
225 230 235 240 

Met Ser Lys Glu Gin Leu Ala Gin Arg Leu Leu Ser Met Glu Ser Gly 
245 250 255 

Val Asp Leu Tyr Ser lie Arg Thr Gly Tyr Leu Asp Gin Glu Lys Trp 
260 265 270 

Glu Arg Leu Thr lie Ala Ala Ser Lys Leu Tyr Lys Ala Pro lie Val 
275 280 285 

Val Asp Asp Glu Ser Leu Leu Asp Pro Arg Ser Leu Arg Ala Lys Ala 
290 295 300 

Arg Arg Met Lys Lys Glu Tyr Asp Val Lys Ala lie Phe Val Asp Tyr 
305 310 315 320 

Leu Gin Leu Met His Leu Lys Gly Arg Lys Glu Ser Arg Gin Gin Glu 
325 330 335 

lie Ser Glu lie Ser Arg Ser Leu Lys Leu Leu Ala Arg Glu Leu Asp 
340 345 350 

lie Val Val lie Ala Leu Ser Gin Leu Ser Arg Ala Val Glu Gin Arg 
355 360 365 

Glu Asp Lys Arg Pro Arg Leu Ser Asp Leu Arg Glu Ser Gly Ala lie 
370 375 380 

Glu Gin Asp Ala Asp Thr Val lie Phe lie Tyr Arg Glu Glu Tyr Tyr 
385 390 395 400 

Arg Ser Lys Lys Ser Lys Glu Glu Ser Lys Leu His Glu Pro His Glu 
405 410 415 

Ala Glu lie lie lie Gly Lys Gin Arg Asn Gly Pro Val Gly Thr lie 
420 425 430 

Thr Leu lie Phe Asp Pro Arg Thr Val Thr Phe His Glu Val Asp Val 
435 440 445 

Val His Ser 
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450 



<210> 153 
<211> 1695 
<212> DNA 

<213> The rma toga maritima 
<400> 153 

gtgattcctc gagaggtcat cgaggaaata aaagaaaagg ttgacatcgt agaggtcatt 60 
tccgagtacg tgaatcttac ccgggtaggt tcctcctaca gggctctctg tccctttcat 12 0 
tcagaaacca atccttcttt ctacgttcat ccgggttfcga agatatacca ttgtttcggc 180 
tgcggtgcga gtggagacgt catcaaattt cttcaagaaa tggaagggat cagtttccag 24 0 
gaagcgctgg aaagacttgc caaaagagct gggattgatc tttctctcta cagaacagaa 300 
gggacttctg aatacggaaa atacattcgt ttgtacgaag aaacgtggaa aaggtacgtc 3 60 
aaagagcfcgg agaaatcgaa agaggcaaaa gactatttaa aaagcagagg cttctctgaa 420 
gaagatatag caaagttcgg ctttgggtac gtccccaaga gatccagcat ctctatagaa 4 80 
gttgcagaag gcatgaacat aacactggaa gaacttgtca gatacggtat cgcgctgaaa 54 0 
aagggtgatc gattcgttga tagattcgaa ggaagaatcg ttgttccaat aaagaacgac 6 00 
agtggtcata ttgtggcttt tggtgggcgt gctctcggca acgaagaacc gaagtatttg 660 
aactctccag agaccaggta tttttcgaag aagaagaccc tttttctctt cgatgaggcg 720 
aaaaaagtgg caaaagaggt tggttttttc gtcatcaccg aaggctactt cgacgcgctc 7 80 
gcattcagaa aggatggaat accaacggcg gtcgctgttc ttggggcgag tctttcaaga 84 0 
gaggcgattc taaaactttc ggcgtattcg aaaaacgtca tactgtgttt cgataatgac 900 
aaagcaggct tcagagccac tctcaaatcc ctcgaggatc tcctagacta cgaattcaac 9 60 
gtgcttgtgg caaccccctc tccttacaaa gacccagatg aactctttca gaaagaagga 1020 
gaaggttcat tgaaaaagat gctgaaaaac tcgcgttcgt tcgaatattt tctggtgacg 1080 
gctggtgagg tcttctttga caggaacagc cccgcgggtg tgagatccta cctttctttc 1140 
ctcaaaggtt gggtccaaaa gatgagaagg aaaggatatt tgaaacacat agaaaatctc 12 00 
gtgaatgagg tttcatcttc tctccagata ccagaaaacc agattttgaa cttttttgaa 12 60 
agcgacaggt ctaacactat gcctgttcat gagaccaagt cgtcaaaggt ttacgatgag 1320 
gggagaggac tggcttattt gtttttgaac tacgaggatt tgagggaaaa gattctggaa 13 80 
ctggacttag aggtactgga agataaaaac gcgagggagt ttttcaagag agtctcactg 1440 
ggagaagatt tgaacaaagt catagaaaac ttcccaaaag agctgaaaga ctggattttt 1500 
gagacaatag aaagcattcc tcctccaaag gatcccgaga aattcctcgg tgacctctcc 1560 
gaaaagttga aaatccgacg gatagagaga cgtatcgcag aaatagatga tatgataaag 162 0 
aaagcttcaa acgatgaaga aaggcgtctt cttctctcta tgaaagtgga tctcctcaga 16 80 
aaaataaaga ggagg 1695 



<210> 154 
<211> 565 
<212> PRT 

<213> Thermatoga maritima 
<400> 154 

Met lie Pro Arg Glu Val He Glu Glu He Lys Glu Lys Val Asp lie 
15 10 15 
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Val Glu Val lie Ser Glu Tyr Val Asn Leu Thr Arg Val Gly Ser Ser 
20 25 30 



Tyr Arg Ala Leu Cys Pro Phe His Ser Glu Thr Asn Pro Ser Phe Tyr 
35 40 45 

Val His Pro Gly Leu Lys lie Tyr His Cys Phe Gly Cys Gly Ala Ser 
50 55 60 

Gly Asp Val lie Lys Phe Leu Gin Glu Met Glu Gly lie Ser Phe Gin 
65 70 75 80 

Glu Ala Leu Glu Arg Leu Ala Lys Arg Ala Gly lie Asp Leu Ser Leu 
85 90 95 

Tyr Arg Thr Glu Gly Thr Ser Glu Tyr Gly Lys Tyr lie Arg Leu Tyr 
100 105 110 

Glu Glu Thr Trp Lys Arg Tyr Val Lys Glu Leu Glu Lys Ser Lys Glu 
115 120 125 

Ala Lys Asp Tyr Leu Lys Ser Arg Gly Phe Ser Glu Glu Asp lie Ala 
130 135 140 

Lys Phe Gly Phe Gly Tyr Val Pro Lys Arg Ser Ser lie Ser He Glu 
145 150 155 160 

Val Ala Glu Gly Met Asn He Thr Leu Glu Glu Leu Val Arg Tyr Gly 
165 170 175 

He Ala Leu Lys Lys Gly Asp Arg Phe Val Asp Arg Phe Glu Gly Arg 
180 185 190 

He Val Val Pro He Lys Asn Asp Ser Gly His He Val Ala Phe Gly 
195 200 205 

Gly Arg Ala Leu Gly Asn Glu Glu Pro Lys Tyr Leu Asn Ser Pro Glu 
210 215 220 

Thr Arg Tyr Phe Ser Lys Lys Lys Thr Leu Phe Leu Phe Asp Glu Ala 
225 230 235 240 

Lys Lys Val Ala Lys Glu Val Gly Phe Phe Val He Thr Glu Gly Tyr 
245 250 255 

Phe Asp Ala Leu Ala Phe Arg Lys Asp Gly He Pro Thr Ala Val Ala 
260 265 270 

129 



Val Leu Gly Ala Ser Leu Ser Arg Glu Ala lie Leu Lys Leu Ser Ala 
275 280 285 



Tyr Ser Lys Asn Val lie Leu Cys Phe Asp Asn Asp Lys Ala Gly Phe 
290 295 300 

Arg Ala Thr Leu Lys Ser Leu Glu Asp Leu Leu Asp Tyr Glu Phe Asn 
305 310 315 320 

Val Leu Val Ala Thr Pro Ser Pro Tyr Lys Asp Pro Asp Glu Leu Phe 
325 330 335 

Gin Lys Glu Gly Glu Gly Ser Leu Lys Lys Met Leu Lys Asn Ser Arg 
340 345 350 

Ser Phe Glu Tyr Phe Leu Val Thr Ala Gly Glu Val Phe Phe Asp Arg 
355 360 365 

Asn Ser Pro Ala Gly Val Arg Ser Tyr Leu Ser Phe Leu Lys Gly Trp 
370 375 380 

Val Gin Lys Met Arg Arg Lys Gly Tyr Leu Lys His lie Glu Asn Leu 
385 390 395 400 

Val Asn Glu Val Ser Ser Ser Leu Gin lie Pro Glu Asn Gin lie Leu 
405 410 415 

Asn Phe' Phe Glu Ser Asp Arg Ser Asn Thr Met Pro Val His Glu Thr 
420 425 430 

Lys Ser Ser Lys Val Tyr Asp Glu Gly Arg Gly Leu Ala Tyr Leu Phe 
435 440 445 

Leu Asn Tyr Glu Asp Leu Arg Glu Lys lie Leu Glu Leu Asp Leu Glu 
450 455 460 

Val Leu Glu Asp Lys Asn Ala Arg Glu Phe Phe Lys Arg Val Ser Leu 
465 470 475 480 

Gly Glu Asp Leu Asn Lys Val lie Glu Asn Phe Pro Lys Glu Leu Lys 
485 490 495 

Asp Trp lie Phe Glu Thr lie Glu Ser lie Pro Pro Pro Lys Asp Pro 
500 505 510 

Glu Lys Phe Leu Gly Asp Leu Ser Glu Lys Leu Lys lie Arg Arg lie 
515 520 525 



130 



Glu Arg Arg lie Ala 
530 

Asp Glu Glu Arg Arg 
545 



Glu lie Asp Asp Met lie 
535 

Leu Leu Leu Ser Met Lys 
550 555 



Lys Lys Ala Ser Asn 
540 

Val Asp Leu Leu Arg 
560 



Lys lie Lys Arg Arg 
565 



<210> 155 
<211> 804 
<212> DNA 

<213> Thermus thermophilus 
<400> 155 

atggctctac acccggctca ccctggggca ataatcgggc acgaggccgt tctcgccctc 6 0 

cttccccgcc tcaccgccca gaccctgctc ttctccggcc ccgagggggt ggggcggcgc 12 0 

accgtggccc gctggtacgc ctgggggctc aaccgcggct tccccccgcc ctccctgggg 180 

gagcacccgg acgtcctcga ggtggggccc aaggcccggg acctccgggg ccgggccgag 24 0 

gtgcggctgg aggaggtggc gcccctcttg gagtggtgct ccagccaccc ccgggagcgg 3 00 

gtgaaggtgg ccatcctgga ctcggcccac ctcctcaccg aggccgccgc caacgccctc 3 60 

ctcaagctcc tggaggagcc cccttcctac gcccgcatcg tcctcatcgc cccaagccgc 42 0 

gccaccctcc tccccaccct ggcctcccgg gccacggagg tggcattcgc ccccgtgccc 480 

gaggaggccc tgcgcgccct cacccaggac ccggagctcc tccgctacgc cgccggggcc 54 0 

ccgggccgcc tccttagggc cctccaggac ccggaggggt accgggcccg catggccagg 600 

gcgcaaaggg tcctgaaagc cccgcccctg gagcgcctcg ctttgcttcg ggagcttttg 66 0 

gccgaggagg agggggtcca cgccctccac gccgtcctaa agcgcccgga gcacctcctt 72 0 
gccctggagc gggcgcggga ggccctggag gggtacgtga gccccgagct ggtcctcgcc 7 80 
cggctggcct tagacttaga gaca 804 



<210> 156 
<211> 268 
<212> PRT 

<213> Thermus thermophilus 
<400> 156 

Met Ala Leu His Pro Ala His Pro Gly Ala lie lie Gly His Glu Ala 
15 10 15 

Val Leu Ala Leu Leu Pro Arg Leu Thr Ala Gin Thr Leu Leu Phe Ser 
20 25 30 

Gly Pro Glu Gly Val Gly Arg Arg Thr Val Ala Arg Trp Tyr Ala Trp 
35 40 45 
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Gly Leu Asn Arg Gly Phe Pro Pro Pro Ser Leu Gly Glu His Pro Asp 
50 55 SO 



Val Leu Glu Val Gly Pro Lys Ala Arg Asp Leu Arg Gly Arg Ala Glu 
65 70 75 80 

Val Arg Leu Glu Glu Val Ala Pro Leu Leu Glu Trp Cys Ser Ser His 
85 90 95 

Pro Arg Glu Arg Val Lys Val Ala lie Leu Asp Ser Ala His Leu Leu 
100 105 110 

Thr Glu Ala Ala Ala Asn Ala Leu Leu Lys Leu Leu Glu Glu Pro Pro 
115 120 125 

Ser Tyr Ala Arg lie Val Leu lie Ala Pro Ser Arg Ala Thr Leu Leu 
130 135 140 

Pro Thr Leu Ala Ser Arg Ala Thr Glu Val Ala Phe Ala Pro Val Pro 
145 150 155 160 

Glu Glu Ala Leu Arg Ala Leu Thr Gin Asp Pro Glu Leu Leu Arg Tyr 
165 170 175 

Ala Ala Gly Ala Pro Gly Arg Leu Leu Arg Ala Leu Gin Asp Pro Glu 
180 185 190 

Gly Tyr Arg Ala Arg Met Ala Arg Ala Gin Arg Val Leu Lys Ala Pro 
195 200 205 

Pro Leu Glu Arg Leu Ala Leu Leu Arg Glu Leu Leu Ala Glu Glu Glu 
210 215 220 

Gly Val His Ala Leu His Ala Val Leu Lys Arg Pro Glu His Leu Leu 
225 230 235 240 

Ala Leu Glu Arg Ala Arg Glu Ala Leu Glu Gly Tyr Val Ser Pro Glu 
245 250 255 

Leu Val Leu Ala Arg Leu Ala Leu Asp Leu Glu Thr 
260 265 



<210> 157 
<211> 729 
<212> DNA 

<213> Thermus thermophilus 
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<400> 157 

atgctggacc tgagggaggt gggggaggcg 
agcgtgcccg agggcgtccc cgtcctcctc 
gccttctacc ggaaccggga aaggcgggac 
cggcacctgg aaaaccgggc caagcgcctg 
tacctggcct ccctggaggg ggacctcgag 
ctcctctcoc cacccctcac cctggagaag 
ctcacgggct ttgacctggt gcgctccgtc 
cgcctaggcg gcctcaagga ggagggggag 
tggcagttcg ccctcctcgc ccgggccttc 
gaggaggacc tcgcccgcct cgaggcccac 
gcgaagcgcc tcacggaaga ggccctcaag 
aagagggcca agggggggaa agacccgtgg 
gcccgttga 



gagtggaagg ccctaaagcc ccttttggaa 60 
ctggacccta agccaagccc ctcccgggcg 12 0 
ttccccaccc ccaaggggaa ggacctggtg 180 
gggctcaggc tcccgggcgg ggtggcccag 240 
gccctggagc gggagctgga gaagcttgcc 3 00 
gtggagaagg tggtggccct gaggcccccc 360 
ctggagaagg accccaagga ggccctcctg 42 0 
gagcccctca ggctcctcgg ggccctctcc 480 
ttcctcctcc gggaaaaccc caggcccaag 54 0 
ccctacgccg cccgccgcgc cctggaggcg 600 
gaggccctgg acgccctcat ggaggcggaa 660 
ctcgccctgg aggcggcggt cctccgcctc 72 0 
729 



<210> 158 
<211> 292 
<212> PRT 

<213> Thermus thermophilics 
<400> 158 

Met Val He Ala Phe Thr Gly Asp Pro Phe Leu Ala Arg Glu Ala Leu 
15 10 15 

Leu Glu Glu Ala Arg Leu Arg Gly Leu Ser Arg Phe Thr Glu Pro Thr 
20 25 30 

Pro Glu Ala Leu Ala Gin Ala Leu Ala Pro Gly Leu Phe Gly Gly Gly 
35 40 45 

Gly Ala Met Leu Asp Leu Arg Glu Val Gly Glu Ala Glu Trp Lys Ala 
50 55 60 

Leu Lys Pro Leu Leu Glu Ser Val Pro Glu Gly Val Pro Val Leu Leu 
65 70 75 80 

Leu Asp Pro Lys Pro Ser Pro Ser Arg Ala Ala Phe Tyr Arg Asn Arg 
85 90 95 

Glu Arg Arg Asp Phe Pro Thr Pro Lys Gly Lys Asp Leu Val Arg His 
100 105 HO 

Leu Glu Asn Arg Ala Lys Arg Leu Gly Leu Arg Leu Pro Gly Gly Val 
115 120 125 

Ala Gin Tyr Leu Ala Ser Leu Glu Gly Asp Leu Glu Ala Leu Glu Arg 
130 135 140 
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Glu Leu Glu Lys 
145 

Val Glu Lys Val 



Val Arg Ser Val 
180 

Gly Gly Leu Lys 
195 

Leu Ser Trp Gin 
210 

Glu Asn Pro Arg 
225 

Pro Tyr Ala Ala 



Glu Ala Leu Lys 
260 

Ala Lys Gly Gly 
275 



Leu Ala Leu Leu 
150 

Val Ala Leu Arg 
165 

Leu Glu Lys Asp 



Glu Glu Gly Glu 
200 

Phe Ala Leu Leu 
215 

Pro Lys Glu Glu 
230 

Arg Arg Ala Leu 
245 

Glu Ala Leu Asp 



Lys Asp Pro Trp 
280 



Ser Pro Pro Leu 
155 

Pro Pro Leu Thr 
170 

Pro Lys Glu Ala 
185 

Glu Pro Leu Arg 



Ala Arg Ala Phe 
220 

Asp Leu Ala Arg 
235 

Glu Ala Ala Lys 
250 

Ala Leu Met Glu 
265 

Leu Ala Leu Glu 



Thr Leu Glu Lys 
160 

Gly Phe Asp Leu 
175 

Leu Leu Arg Leu 
190 

Leu Leu Gly Ala 
205 

Phe Leu Leu Arg 



Leu Glu Ala His 
240 

Arg Leu Thr Glu 
255 

Ala Glu Lys Arg 
270 

Ala Ala Val Leu 
285 



Arg Leu Ala Arg 
290 



<210> 159 
<211> 37 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<400> 159 

gtgtgtcata tgagtaagga tttcgtccac cttcacc 



<210> 160 
<211> 34 
<212> DNA 

<213> Artificial Sequence 
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<223> Description of Artificial Sequence: primer 



<400> 160 

gtgtgtggat ccggggacta ctcggaagta aggg 

<210> 161 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 161 

gtgtgtcata tggaaaccac aatattccag ttccag 



<210> 162 
<211> 39 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 162 

gtgtgtggat ccttatccac catgagaagt atttttcac 



<210> 163 
<211> 41 
<212> DNA 

<213> Artificial Sequence 
<220> 

<2 23> Description of Artificial Sequence: primer 
<400> 163 

gtgtgtcata tggaaaaagt tttttttgga aaaaactcca g 



<210> 164 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 
<400> 164 

gtgtgtggat ccttaatccg cctgaacggc taacg 



<210> 165 
<211> 41 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 165 

gtgtgtcata tgaactacgt tcccttcgcg agaaagtaca g 



<210> 166 
<211> 36 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 166 

gtgtgtggat ccttaaaaca gcctcgtccc gctgga 



<210> 167 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 167 

gtgtgtcata tgcgcgttaa ggtggacagg gag 



<210> 168 
<211> 35 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 
<400> 1S8 

tgtgtctcga gtcatggcta caccctcatc ggcat 



<210> 169 
<211> 47 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 169 

gtgtgtcata tgctcaataa ggtttttata ataggaagac ttacggg 



<210> 170 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<400> 170 

gtgtggatcc ttaaaaaggt atttcgtcct cttcatcgg 



<210> 171 
<211> 807 
<212> DNA 

<213> Thermus thermophilus 



<400> 171 

atggctcgag gcctgaaccg 
cgctacaccc cggcggggct 
ctttccgata acggggggga 
cgccaggcgg agatgtgggg 
cgcctggagt accgccagtg 
gccgacttcc ggaccccctg 
agcccaggct ccgcgccgcc 
cggaactccg ctacaccccc 
agcgccgcca gggggcggag 
tggcggagtg ggccgccgag 



cgttttcctc atcggcgccc 
cgccattttg gacctgaccc 
accggaggtg tcctggtacc 
cgacctcttg gaccaagggc 
ggaaagggag ggggagaagc 
gacgaccggg ggaagaagcg 
ctgaaccagg tcttcctcat 
cagggcaccg cggtggcccg 
gagcgcaccc acttcgtgga 
ctgaggaagg gcgacggcct 



tcgccacccg gccggacatg 60 
tcgccggtca ggacctgctt 120 
accgggtgag gctcttaggc 180 
agctcgtctt cgtggagggc 240 
ggagcgagct ccagatccgg 3 00 
ggcggaggac agccggggcc 3 60 
gggcaacctg acccgggacc 42 0 
gctgggcctg gcggtgaacg 4 80 
ggttcaggcc tggcgcgacc 540 
tttcgtgatc ggcaggttgg 600 
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tgaacgactc ctggaccagc tccagcggcg agcggcgctt ccagacccgt gtggaggccc 660 

tcaggctgga gcgccccacc cgtggacctg cccaggcctg cccaggccgg cggaacaggt 720 

cccgcgaagt ccagacgggt ggggtggaca ttgacgaagg cttggaagac tttccgccgg 780 

aggaggattt gccgttttga gcacgaa 807 



<210> 172 
<211> 266 
<212> PRT 

<213> Thermus thermophilus 
<400> 172 

Met Ala Arg Gly Leu Asn Arg Val Phe Leu He Gly Ala Leu Ala Thr 



Arg Pro Asp Met Arg Tyr Thr Pro Ala Gly Leu Ala He Leu Asp Leu 

20 25 30 

Thr Leu Ala Gly Gin Asp Leu Leu Leu Ser Asp Asn Gly Gly Glu Pro 
35 40 45 

Glu Val Ser Trp Tyr His Arg Val Arg Leu Leu Gly Arg Gin Ala Glu 
50 55 60 

Met Trp Gly Asp Leu Leu Asp Gin Gly Gin Leu Val Phe Val Glu Gly 

65 70 75 80 

Arg Leu Glu Tyr Arg Gin Trp Glu Arg Glu Gly Glu Lys Arg Ser Glu 



Leu Gin lie Arg Ala Asp Phe Leu Asp Pro Leu Asp Asp Arg Gly Lys 
100 105 110 

Lys Arg Ala Glu Asp Ser Arg Gly Gin Pro Arg Leu Arg Ala Ala Leu 
115 120 125 

Asn Gin Val Phe Leu Met Gly Asn Leu Thr Arg Asp Pro Glu Leu Arg 
130 135 140 

Tyr Thr Pro Gin Gly Thr Ala Val Ala Arg Leu Gly Leu Ala Val Asn 
145 150 155 160 

Glu Arg Arg Gin Gly Ala Glu Glu Arg Thr His Phe Val Glu Val Gin 
165 170 175 

Ala Trp Arg Asp Leu Ala Glu Trp Ala Ala Glu Leu Arg Lys Gly Asp 
180 185 190 
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Gly Leu Phe Val He Gly Arg Leu 
195 200 

Ser Gly Glu Arg Arg Phe Gin Thr 
210 215 

Arg Pro Thr Arg Gly Pro Ala Gin 
225 230 

Ser Arg Glu Val Gin Thr Gly Gly 
245 

Asp Phe Pro Pro Glu Glu Asp Leu 
260 



Val Asn Asp Ser Trp Thr Ser Ser 
205 

Arg Val Glu Ala Leu Arg Leu Glu 
220 

Ala Cys Pro Gly Arg Arg Asn Arg 
235 240 

Val Asp He Asp Glu Gly Leu Glu 
250 255 

Pro Phe 
265 



<210> 173 
<211> 992 
<212> DNA 

<213> Bacillus stearothermophilus 
<400> 173 

aattccgaca tttcaattga atcgtttatt ccgcttgaaa aagaaggcaa gttgctcgtt 60 
gatgtgaaaa gaccggggag catcgtactg caggcgcgct ttttctctga aatcgtgaaa 12 0 
aaactgccgc aacaaacggt ggaaatcgaa acggaagaca actttttgac gatcatccgc 180 
tcggggcact cagaattccg cctcaatggg ctaaacgccg acgaatatcc gcgcctgccg 24 0 
caaattgaag aagaaaacgt gtttcaaatc ccggctgatt tattgaaaac cgtgattcgg 300 
caaacggtgt tcgccgtttc tacatcggaa acgcgcccaa tcttgacagg tgtcaactgg 360 
aaagttgaac atggcgagct tgtctgcaca gcgaccgaca gtcatcgctt agccatgcgc 42 0 
aaagtgaaaa ttgagtcgga aaatgaagta tcatacaacg tcgtcatccc tggaaaaagt 48 0 
cttaatgagc tcagcaaaat tttggatgac ggcaaccacc cggtggacat cgtcatgaca 54 0 
gccaatcaag tgctatttaa ggccgagcac cttctcttct tttcccggct gcttgacggc 600 
aactatccgg agacggcccg cttgattcca acagaaagca aaacgaccat gatcgtcaat 66 0 
gcaaaagagt ttctgcaggc aatcgaccga gcgtccttgc ttgctcgaga aggaaggaac 72 0 
aacgttgtga aactgacgac gcttcctgga ggaatgctcg aaatttcttc gatttctccg 780 
agatcgggaa agtgacggag cagctgcaaa cggagtctct tgaaggggaa gagttgaaca 840 
tttcgttcag cgcgaaatat atgatggacg cgttgcgggc gcttgatgga acagacattt 900 
caaatcagct tcactggggc catgcggccg ttcctgttgc gcccgcttca accgattcga 9 60 
tgcttcagct cattttgccg gtgagaacat at 9 92 



<210> 174 
<211> 334 
<212> PRT 

<213> Bacillus stearothermophilus 
<400> 174 

Asn Ser Asp He Ser He He Glu Ser Phe He Pro Leu Glu Lys Glu 
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15 10 15 

Gly Lys Leu Leu Val Asp Val Lys Arg Pro Gly Ser lie Val Leu Gin 



Ala Arg Phe Phe Ser Glu lie Val Lys Lys Leu Pro Gin Gin Thr Val 
35 40 45 

Glu lie Glu Thr Glu Asp Asn Phe Leu Thr lie lie Arg Ser Gly His 
50 55 60 

Ser Glu Phe Arg Leu Asn Gly Leu Asn Ala Asp Glu Tyr Pro Arg Leu 



Pro Gin lie Glu Glu Glu Asn Val Phe Gin lie Pro Ala Asp Leu Leu 



Lys Thr Val lie Arg Gin Thr Val Phe Ala Val Ser Thr Ser Glu Thr 
100 105 110 

Arg Pro lie Leu Thr Gly Val Asn Trp Lys Val Glu His Gly Glu Leu 
115 120 125 

Val Cys Thr Ala Thr Asp Ser His Arg Leu Ala Met Arg Lys Val Lys 
130 135 140 

He He Glu Ser Glu Asn Glu Val Ser Tyr Asn Val Val He Pro Gly 
145 150 155 160 

Lys Ser Leu Asn Glu Leu Ser Lys He He Leu Asp Asp Gly Asn His 
165 170 175 

Pro Val Asp lie Val Met Thr Ala Asn Gin Val Leu Phe Lys Ala Glu 
180 185 190 

His Leu Leu Phe Phe Ser Arg Leu Leu Asp Gly Asn Tyr Pro Glu Thr 
195 200 205 

Ala Arg Leu He Pro Thr Glu Ser Lys Thr Thr Met He Val Asn Ala 
210 215 220 

Lys Glu Phe Leu Gin Ala He Asp Arg Ala Ser Leu Leu Ala Arg Glu 
225 230 235 240 

Gly Arg Asn Asn Val Val Lys Leu Thr Thr Leu Pro Gly Gly Met Leu 
245 250 255 

Glu He Ser Ser He Ser Pro Glu He Gly Lys Val Thr Glu Gin Leu 

140 



260 



265 



270 



Gin Thr Glu Ser Leu Glu Gly Glu 
275 280 

Lys Tyr Met Met Asp Ala Leu Arg 
290 295 

lie Ser Phe Thr Gly Ala Met Arg 
305 310 

Thr Asp Ser Met Leu Gin Leu lie 
325 



Glu Leu Asn lie Ser Phe Ser Ala 
285 

Ala Leu Asp Gly Thr Asp lie Gin 
300 

Pro Phe Leu Leu Arg Pro Leu His 
315 320 

Leu Pro Val Arg Thr Tyr 
330 



<210> 175 
<211> 492 
<212> DNA 

<213> Bacillus stearothermophilus 
<400> 175 

atgattaacc gcgtcatttt ggtcggcagg 
ccaagcggag tggctgttgc cacgtttacg 
cagggcgagc gggaaacgga ttttattcaa 
gtcgccaact ttttgaaaaa ggggagcttg 
agctatgaaa atcaagaagg tcggcgtgtg 
caatttcttg agccgaaagg aacgagcgag 
ggggatccat tcccattcgg gcaagatcag 
ggccgcatcg atgacgatcc tttcgccaat 
gatttgccgt tt 



ttaacgagag atccggagtt gcgttacact 6 0 
ctcgcggtca accgtccgtt tacaaatcag 12 0 
tgtgtcgttt ggcgccgcca ggcggaaaac 180 
gctggtgtcg atggccgact gcaaacccgc 24 0 
tacgtgacgg aagtggtggc tgatagcgtc 300 
cagcgagggg cgacagcagg cggctactat 3 60 
aaccaccaat atccgaacga aaaagggttt 42 0 
gacggccagc cgatcgatat ttctgatgat 480 
492 



<210> 176 
<211> 164 
<212> PRT 

<213> Bacillus stearothermophilus 
<400> 176 

Met He Asn Arg Val He Leu Val Gly Arg Leu Thr Arg Asp Pro Glu 
15 10 15 

Leu Arg Tyr Thr Pro Ser Gly Val Ala Val Ala Thr Phe Thr Leu Ala 
20 25 30 

Val Asn Arg Pro Phe Thr Asn Gin Ser Tyr Glu Asn Gin Glu Gly Arg 
35 40 45 

Arg Val Tyr Val Thr Glu Val Val Ala Asp Ser Val Gin Phe Leu Glu 
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50 55 

Pro Lys Gly Thr Ser Glu Gin Arg 
65 70 

Gin Gly Glu Arg Glu Thr Asp Phe 
85 

Gin Ala Glu Asn Val Ala Asn Phe 
100 

Val Asp Gly Arg Leu Gin Thr Arg 
115 120 

Asp Gin Asn His Gin Tyr Pro Asn 
130 135 

Asp Asp Pro Phe Ala Asn Asp Gly 
145 150 

Asp Leu Pro Phe 



60 

Gly Ala Thr Ala Gly Gly Tyr Tyr 
75 80 

lie Gin Cys Val Val Trp Arg Arg 
90 95 

Leu Lys Lys Gly Ser Leu Ala Gly 
105 110 

Gly Asp Pro Phe Pro Phe Gly Gin 
125 

Glu Lys Gly Phe Gly Arg lie Asp 
140 

Gin Pro lie Asp lie Ser Asp Asp 
155 160 



<210> 177 
<211> 1044 
<212> DNA 

<213> Bacillus stearothermophilus 
<400> 177 

atgctggaac gcgtatgggg aaacattgaa aaacggcgtt tttctcccct ttatttatta 60 
tacggcaatg agccgttttt attaacggaa acgtatgagc gattggtgaa cgcagcgctt 12 0 
ggccccgagg agcgggagtg gaacttggct gtgtacgact gcgaggaaac gccgatcgag 180 
gcggcgcttg aggaggccga gacggtgccg tttttcggcg agcggcgtgt cattctcatc 24 0 
aagcatccat atttttttac gtctgaaaaa gagaaggaga tcgaacatga tttggcgaag 300 
ctggaggcgt acttgaaggc gccgtcgccg ttttcgatcg tcgtcttttt cgcgccgtac 360 
gagaagcttg atgagcgaaa aaaaattacg aagctcgcca aagagcaaag cgaagtcgtc 420 
atcgccgccc cgctcgccga agcggagctg cgtgcctggg tgcggcgccg catcgagagc 480 
caaggggcgc aagcaagcga cgaggcgatt gatgtcctgt tgcggcgggc cgggacgcag 54 0 
ctttccgcct tggcgaatga aatcgataaa ttggccctgt ttgccggatc gggcggaacc 600 
atcgaggcgg cggcggttga gcggcttgtc gcccgcacgc cggaagaaaa cgtatttgtg 6 60 
cttgtcgagc aagtggcgaa gcgcgacatt ccagcagcgt tgcagacgtt ttatgatctg 7 20 
cttgaaaaca atgaagagcc gatcaaaatt ttggcgttgc tcgccgccca tttccgcttg 780 
ctttcgcaag tgaaatggct tgcctcctta ggctacggac aggcgcaaat tgctgcggcg 84 0 
ctcaaggtgc acccgttccg cgtcaagctc gctcttgctc aagcggcccg cttcgctgac 9 00 
ggagagcttg ctgaggcgat caacgagctc gctgacgccg attacgaagt gaaaagcggg 9 60 
gcggtcgatc gccggttggc cgttgagctg cttctgatgc gctggggcgc ccgcccggcg 1020 
caagcggggc gccacggccg gcgg 1044 
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<210> 178 
<211> 348 
<212> PRT 

<213> Bacillus stearothermophilus 
<400> 178 

Met Leu Glu Arg Val Trp Gly Asn lie Glu Lys Arg Arg Phe Ser Pro 
15 10 15 

Leu Tyr Leu Leu Tyr Gly Asn Glu Pro Phe Leu Leu Thr Glu Thr Tyr 
20 25 30 

Glu Arg Leu Val Asn Ala Ala Leu Gly Pro Glu Glu Arg Glu Trp Asn 
35 40 45 

Leu Ala Val Tyr Asp Cys Glu Glu Thr Pro lie Glu Ala Ala Leu Glu 
50 55 60 

Glu Ala Glu Thr Val Pro Phe Phe Gly Glu Arg Arg Val lie Leu lie 
65 70 75 80 

Lys His Pro Tyr Phe Phe Thr Ser Glu Lys Glu Lys Glu lie Glu His 
85 90 95 

Asp Leu Ala Lys Leu Glu Ala Tyr Leu Lys Ala Pro Ser Pro Phe Ser 
100 105 110 

lie Val Val Phe Phe Ala Pro Tyr Glu Lys Leu Asp Glu Arg Lys Lys 
115 120 125 

lie Thr Lys Leu Ala Lys Glu Gin Ser Glu Val Val He Ala Ala Pro 
130 135 140 

Leu Ala Glu Ala Glu Leu Arg Ala Trp Val Arg Arg Arg He Glu Ser 
145 150 155 160 

Gin Gly Ala Gin Ala Ser Asp Glu Ala He Asp Val Leu Leu Arg Arg 
165 170 175 

Ala Gly Thr Gin Leu Ser Ala Leu Ala Asn Glu He Asp Lys Leu Ala 
180 185 190 

Leu Phe Ala Gly Ser Gly Gly Thr He Glu Ala Ala Ala Val Glu Arg 
195 200 205 

Leu Val Ala Arg Thr Pro Glu Glu Asn Val Phe Val Leu Val Glu Gin 
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210 

Val Ala Lys Arg 
225 

Leu Glu Asn Asn 



His Phe Arg Leu 
260 

Gly Gin Ala Gin 
275 

Lys Leu Ala Leu 
290 

Glu Ala lie Asn 
305 

Ala Val Asp Arg 



Ala Arg Pro Ala 
340 



215 

Asp lie Pro Ala 
230 

Glu Glu Pro He 
245 

Leu Ser Gin Val 



He Ala Ala Ala 
280 

Ala Gin Ala Ala 
295 

Glu Leu Ala Asp 
310 

Arg Leu Ala Val 
325 

Gin Ala Gly Arg 



220 

Ala Leu Gin Thr 
235 

Lys He Leu Ala 
250 

Lys Trp Leu Ala 
265 

Leu Lys Val His 



Arg Phe Ala Asp 
300 

Ala Asp Tyr Glu 
315 

Glu Leu Leu Leu 
330 

His Gly Arg Arg 
345 



Phe Tyr Asp Leu 
240 

Leu Leu Ala Ala 
255 

Ser Leu Gly Tyr 
270 

Pro Phe Arg Val 
285 

Gly Glu Leu Ala 



Val Lys Ser Gly 
320 

Met Arg Trp Gly 
335 



<210> 179 
<211> 757 
<212> DNA 

<213> Bacillus stearothermophilus 
<400> 179 

atgcgatggg aacagctagc gaaacgccag 
ttggaaaaag ggcggatttc tcatgcgtac 
aaagcggcca gtttgttgtt ggcgaaacgt 
ccgtgtctag agtgccgcaa ctgccggcgc 
gtgatcggcc cagatggagg atcaatcaaa 
ttctcgaaaa cagcggtcga gtcggataaa 
atgacgacaa gcgctgccaa cagccttctg 
gtggcggtat tgctgactga gcaataccac 
caagtgcttt cgttccggcc gttgccgccg 
cacgtgccgt tgccgttggc gctgttggct 
ctggcgcttg ccaaagatag ttggtttgcc 
gagatgctgg gcaagccgga gctgcagctt 
cattttttgg aaagccatca gcttgacctt 



ccggtggtgg cgaaaatgct gcaaagcggc 60 
ttgtttgagg ggcagcgggg gacgggcaaa 12 0 
ttgttttgtc tgtccccaat cggagtttcc 180 
atcgactccg gcaaccaccc tgacgtccgg 240 
aaggaacaaa tcgaatggct gcagcaagag 300 
aaaatgtaca tcgttgagca cgccgatcaa 3 60 
aaatttttgg aagagccgca tccggggacg 42 0 
cgcctgctag ggacgatcgt ttcccgctgt 480 
gcagagctcg cccagggact tgtcgaggag 540 
gcccatttga caaacagctt cgaggaagca 600 
gaggcgcgaa cattagtgct acaatggtat 660 
ttgtttttca tccacgaccg cttgtttccg 720 
ggacttg 757 
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<210> 180 
<211> 252 
<212> PRT 

<213> Bacillus s tearo thermophi lus 
<400> 180 

Met Arg Trp Glu Gin Leu Ala Lys Arg Gin Pro Val Val Ala Lys Met 
15 10 15 

Leu Gin Ser Gly Leu Glu Lys Gly Arg lie Ser His Ala Tyr Leu Phe 
20 25 30 

Glu Gly Gin Arg Gly Thr Gly Lys Lys Ala Ala Ser Leu Leu Leu Ala 
35 40 45 

Lys Arg Leu Phe Cys Leu Ser Pro He Gly Val Ser Pro Cys Leu Glu 
50 55 60 

Cys Arg Asn Cys Arg Arg He Asp Ser Gly Asn His Pro Asp Val Arg 
65 70 75 80 

Val lie Gly Pro Asp Gly Gly Ser lie Lys Lys Glu Gin lie Glu Trp 
85 90 95 

Leu Gin Gin Glu Phe Ser Lys Thr Ala Val Glu Ser Asp Lys Lys Met 
100 105 110 

Tyr He Val Glu His Ala Asp Gin Met Thr Thr Ser Ala Ala Asn Ser 
115 120 125 

Leu Leu Lys Phe Leu Glu Glu Pro His Pro Gly Thr Val Ala Val Leu 
130 135 140 

Leu Thr Glu Gin Tyr His Arg Leu Leu Gly Thr lie Val Ser Arg Cys 
145 150 155 ISO 

Gin Val Leu Ser Phe Arg Pro Leu Pro Pro Ala Glu Leu Ala Gin Gly 
165 170 175 

Leu Val Glu Glu His Val Pro Leu Pro Leu Ala Leu Leu Ala Ala His 
180 185 190 

Leu Thr Asn Ser Phe Glu Glu Ala Leu Ala Leu Ala Lys Asp Ser Trp 
195 200 205 

Phe Ala Glu Ala Arg Thr Leu Val Leu Gin Trp Tyr Glu Met Leu Gly 
210 215 220 
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Lys Pro Glu Leu Gin Leu Leu Phe Phe lie His Asp Arg Leu Phe Pro 
225 230 235 240 



His Phe Leu Glu Ser His Gin Leu Asp Leu Gly Leu 
245 250 



<210> 181 
<211> 1677 
<212> DNA 

<213> Bacillus stearothermophilus 
<400> 181 

gtggcatacc aagcgttata tcgcgtgttt cggccgcagc gctttgcgga catggtcggc 6 0 
caagaacacg tgaccaagac gttgcaaagc gccctgcttc aacataaaat atcgcacgct 12 0 
tacttatttt ccggcccgcg cggtacagga aaaacgagcg cagcgaaaat tttcgccaag 180 
gcggtcaact gtgaacaggc gccagcggcg gagccatgca atgagtgtcc agcttgcctc 240 
ggcattacga atggaacggt tcccgatgtg ctggaaattg acgctgcttc caacaaccgc 3 00 
gtcgatgaaa ttcgtgatat ccgtgagaag gtgaaatttg cgccaacgtc ggcccgctac 3 60 
aaagtgtata tcatcgacga ggtgcatatg ctgtcgatcg gtgcgtttaa cgcgctgttg 42 0 
aaaacgttgg aggagccgcc gaaacacgtc attttcattt tggccacgac cgagccgcac 480 
aaaattccgg cgacgatcat ttcccgctgc caacggttcg attttcgccg catcccgctt 540 
caggcgatcg tttcacggct aaagtacgtc gcaagcgccc aaggtgtcga ggcgtcagat 60 0 
gaggcattgt ccgccatcgc ccgtgctgca gacgggggga tgcgcgatgc gctcagcttg 660 
cttgatcaag ccatttcgtt cagcgacggg aaacttcggc tcgacgacgt gctggcgatg 72 0 
accggggctg catcatttgc cgccttatcg agcttcatcg aagccatcca ccgcaaagat 780 
acagcggcgg ttcttcagca cttggaaacg atgatggcgc aagggaaaga tccgcatcgt 84 0 
ttggttgaag acttgatttt gtactatcgc gatttattgc tgtacaaaac cgctccctat 900 
gtggagggag cgattcaaat tgctgtcgtt gacgaagcgt tcacttcact gtcggaaatg 960 
attccggttt ccaatttata cgaggccatc gagttgctga acaaaagcca gcaagagatg 102 0 
aagtggacaa accacccgcg ccttctgttg gaagtggcgc ttgtgaaact ttgccatcca 1080 
tcagccgccg ccccgtcgct gtcggcttcc gagttggaac cgttgataaa gcggattgaa 114 0 
acgctggagg cggaattgcg gcgcctgaag gaacaaccgc ctgcccctcc gtcgaccgcc 1200 
gcgccggtga aaaaactgtc caaaccgatg aaaacggggg gatataaagc cccggttggc 12 6 0 
cgcatttacg agctgttgaa acaggcgacg catgaagatt tagctttggt gaaaggatgc 132 0 
tgggcggatg tgctcgacac gttgaaacgg cagcataaag tgtcgcacgc tgccttgctg 13 8 0 
caagagagcg agccggttgc agcgagcgcc tcagcgtttg tattaaaatt caaatacgaa 144 0 
atccactgca aaatggcgac cgatcccaca agttcggtca aagaaaacgt cgaagcgatt 1500 
ttgtttgagc tgacaaaccg ccgctttgaa atggtagcca ttccggaggg agaatgggga 1560 
aaaataagag aagagttcat ccgcaataag gacgccatgg tggaaaaaag cgaagaagat 162 0 
ccgttaatcg ccgaagcgaa gcggctgttt ggcgaagagc tgatcgaaat taaagaa 1677 



<210> 182 
<211> 559 
<212> PRT 

<213> Bacillus stearothermophilus 
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<400> 182 

Val Ala Tyr Gin Ala Leu Tyr Arg Val Phe Arg Pro Gin Arg Phe Ala 
15 10 15 



Asp Met Val Gly Gin Glu His Val Thr Lys Thr Leu Gin Ser Ala Leu 



Leu Gin His Lys lie Ser His Ala Tyr Leu Phe Ser Gly Pro Arg Gly 
35 40 45 

Thr Gly Lys Thr Ser Ala Ala Lys He Phe Ala Lys Ala Val Asn Cys 



Glu Gin Ala Pro Ala Ala Glu Pro Cys Asn Glu Cys Pro Ala Cys Leu 



Gly He Thr Asn Gly Thr Val Pro Asp Val Leu Glu He Asp Ala Ala 



Ser Asn Asn Arg Val Asp Glu lie Arg Asp lie Arg Glu Lys Val Lys 
100 105 110 

Phe Ala Pro Thr Ser Ala Arg Tyr Lys Val Tyr He He Asp Glu Val 
115 120 125 

His Met Leu Ser He Gly Ala Phe Asn Ala Leu Leu Lys Thr Leu Glu 
130 135 140 

Glu Pro Pro Lys His Val He Phe He Leu Ala Thr Thr Glu Pro His 
145 150 155 160 

Lys He Pro Ala Thr He He Ser Arg Cys Gin Arg Phe Asp Phe Arg 
165 170 175 

Arg He Pro Leu Gin Ala He Val Ser Arg Leu Lys Tyr Val Ala Ser 
180 185 190 

Ala Gin Gly Val Glu Ala Ser Asp Glu Ala Leu Ser Ala He Ala Arg 
195 200 205 

Ala Ala Asp Gly Gly Met Arg Asp Ala Leu Ser Leu Leu Asp Gin Ala 
210 215 220 

He Ser Phe Ser Asp Gly Lys Leu Arg Leu Asp Asp Val Leu Ala Met 
225 230 235 240 

Thr Gly Ala Ala Ser Phe Ala Ala Leu Ser Ser Phe He Glu Ala He 
245 250 255 
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His Arg Lys Asp Thr Ala Ala Val Leu Gin His Leu Glu Thr Met Met 
260 265 270 



Ala Gin Gly Lys Asp Pro His Arg Leu Val Glu Asp Leu lie Leu Tyr 
275 280 285 

Tyr Arg Asp Leu Leu Leu Tyr Lys Thr Ala Pro Tyr Val Glu Gly Ala 
290 295 300 

lie Gin lie Ala Val Val Asp Glu' Ala Phe Thr Ser Leu Ser Glu Met 
305 310 315 320 

lie Pro Val Ser Asn Leu Tyr Glu Ala lie Glu Leu Leu Asn Lys Ser 
325 330 335 

Gin Gin Glu Met Lys Trp Thr Asn His Pro Arg Leu Leu Leu Glu Val 
340 345 350 

Ala Leu Val Lys Leu Cys His Pro Ser Ala Ala Ala Pro Ser Leu Ser 
355 360 365 

Ala Ser Glu Leu Glu Pro Leu lie Lys Arg lie Glu Thr Leu Glu Ala 
370 375 380 

Glu Leu Arg Arg Leu Lys Glu Gin Pro Pro Ala Pro Pro Ser Thr Ala 
385 390 395 400 

Ala Pro Val Lys Lys Leu Ser Lys Pro Met Lys Thr Gly Gly Tyr Lys 
405 410 415 

Ala Pro Val Gly Arg lie Tyr Glu Leu Leu Lys Gin Ala Thr His Glu 
420 425 430 

Asp Leu Ala Leu Val Lys Gly Cys Trp Ala Asp Val Leu Asp Thr Leu 
435 440 445 

Lys Arg Gin His Lys Val Ser His Ala Ala Leu Leu Gin Glu Ser Glu 
450 455 460 

Pro Val Ala Ala Ser Ala Ser Ala Phe Val Leu Lys Phe Lys Tyr Glu 
465 470 475 480 

lie His Cys Lys Met Ala Thr Asp Pro Thr Ser Ser Val Lys Glu Asn 
485 490 495 

Val Glu Ala He Leu Phe Glu Leu Thr Asn Arg Arg Phe Glu Met Val 
500 505 510 
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Ala lie Pro Glu Gly Glu Trp Gly Lys lie Arg Glu Glu Phe lie Arg 
515 520 525 



Asn Lys Asp Ala Met Val Glu Lys Ser Glu Glu Asp Pro Leu lie Ala 
530 535 540 

Glu Ala Lys Arg Leu Phe Gly Glu Glu Leu lie Glu lie Lys Glu 
545 550 555 



<210> 183 
<211> 4301 
<212> DNA 

<213> Bacillus stearothermophilus 
<400> 183 

atggtgacaa aagagcaaaa agagcggttt ctcatcctgc ttgagcagct gaagatgacg 60 
tcggacgaat ggatgccgca ttttcgtgag gcagccattc gcaaagtcgt gatcgataaa 12 0 
gaggagaaaa gctggcattt ttattttcag ttcgacaacg tgctgccggt tcatgtatac 180 
aaaacgtttg ccgatcggct gcagacggcg ttccgccata tcgccgccgt ccgccatacg 240 
atggaggtcg aagcgccgcg cgtaactgag gcggatgtgc aggcgtattg gccgctttgc 30 0 
cttgccgagc tgcaagaagg catgtcgccg cttgtcgatt ggctcagccg gcagacgcct 360 
gagctgaaag gaaacaagct gcttgtcgtt gcccgccatg aagcggaagc gctggcgatc 42 0 
aaacggcggt tcgccaaaaa aatcgctgat gtgtacgctt cgtttgggtt tccccocctt 480 
cagcttgacg tcagcgtcga gccgtccaag caagaaatgg aacagttttt ggcgcaaaaa 54 0 
cagcaagagg acgaagagcg agcgcttgct gtactgaccg atttagcgag ggaagaagaa 600 
aaggccgcgt ctgcgccgcc gtccggtccg cttgtcatcg gctatccgat ccgcgacgag 660 
gagccggtgc ggcggcttga aacgatcgtc gaagaagagc ggcgcgtcgt tgtgcaaggc 72 0 
tatgtatttg acgccgaagt gagcgaatta aaaagcggcc gcacgctgtt gaccatgaaa 7 80 
atcacagatt acacgaactc gattttagtc aaaatgttct cgcgcgacaa agaggacgcc 84 0 
gagcttatga gcggcgtcaa aaaaggcatg tgggtgaaag tgcgcggcag cgtgcaaaac 90 0 
gatacgttcg tccgtgattt ggtcatcatc gccaacgatt tgaacgaaat cgccgcaaac 96 0 
gaacggcaag atacggcgcc ggaaggggaa aagagggtcg agctccattt gcataccccg 102 0 
atgagccaaa tggacgcggt cacctcggtg acaaaactca ttgagcaagc gaaaaaatgg 1080 
gggcatccgg cgatcgccgt caccgaccat gccgttgttc agtcgtttcc ggaggcctac 1140 
agcgcggcga aaaaacacgg catgaaggtc atttacggcc ttgaggcgaa catcgtcgac 12 0 0 
gatggcgtgc cgatcgccta caatgagacg caccgccgtc tttcggagga aacgtacgtc 12 6 0 
gtctttgacg tcgagacgac gggcctgtcg gctgtgtaca atacgatcat tgagctggcg 132 0 
gcggtgaaag tgaaagacgg cgagatcatc gaccgattca tgtcgtttgc caaccctgga 13 80 
catccgttgt cggtgacaac gatggagctg actgggatca ccgatgagat ggtgaaagac 144 0 
gccccgaagc cggacgaggt getagcccgt tttgttgact gggccggcga tgcgacgctt 1500 
gttgcccaca acgccagctt tgacatcggt tttttaaacg cgggcctcgc tcgcatgggg 15 6 0 
cgcggcaaaa tcgcgaatcc agtcatcgat acgctcgagc tggcccgttt tttatacccg 162 0 
gatttgaaaa accatcggct caatacattg tgcaaaaaat ttgacattga attgacgcag 168 0 
catcaccgcg ccatctacga cgcggaggcg accgggcatt tgcttatgcg gctgttgaag 174 0 
gaagcggaag agcgcggcat actgtttcat gacgaattaa acagccgcac gcacagcgaa 1800 
gcgtcctatc ggcttgcgcg cccgtfcccat gtgacgctgt tggcgcaaaa cgagactgga 1860 
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ttgaaaaatt tgttcaagct tgtgtcattg 
cgcatcccgc gctccgtgct cgtcaagcac 
gacaaaggag agctgtttga caacttgatc 
gcccgttttt acgattfctct tgaagtgcat 
atggattatg tgaaagacga agagatgatc 
ggtgagaagc ttgacatccc ggttgtcgcc 
gataaaattt accggaaaat cttaatccat 
catgaactgc cggatgtata tttccgtacg 
ttagggccgg aaaaagcgaa ggaaatcgtc 
atcggcgatg tcaagccgat caaagatgag 
gaggaaatca gggaaatgag ctaccggcgg 
aaacttgttg aagagcggct tgagaaggag 
gtcatttatt tgatctcgca caagcttgtg 
gggtcgcgcg gatcggtcgg ctcgtcgttt 
aatccgctgc cgccgcatta ogtttgcccg 
ggttcagtcg gctcagggtt tgatttgccg 
tacaagaaag acgggcacga catcccgttt 
gtgccggata tcgacttgaa cttttccggc 
aaagtgctgt ttggcgaaga caacgtctac 
aaaacggcgt acggatttgt caaagcgtat 
gcggaaatcg acggctcgcg gctggctgca 
cgggcggcat catcgtcgtc ccggattata 
atccggccga tgacacgtcc tctgaatggc 
acgacaattt gttgaagctc gatattctcg 
tgcaagattt aagcggcatc gatccgaaaa 
gcattttcag cagcaccgag ccgcttggcg 
gcacgatcgg cattccggag tttggcacgc 
ggccaaaaac gttttccgaa ctcgtgcaaa 
ggctcggcaa cgcgcaagag ctcattcaaa 
gctgccgcga cgacattatg gtctatttga 
ttaaaatcat ggaatccgtg cgcaaaggaa 
tgcgcaaaca tgacgtgccg gagtggtaca 
tcccgaaagc gcacgccgcc gcctacgtgt 
tgcaccatcc gcttttgtat tacgcgtcgt 
ttgacgccat gatcaaagga tcacccgcca 
aaggcattca ggcgacggcg aaagaaaaaa 
agatgtgcga gcgcggcttt tcctttaaaa 
aattcgtcat tgacggcaat tctctcattc 
cgaacgtggc gcaggcgatc gtgcgcgccc 
atttgcaaca gcgcggcaaa ttgtcgaaaa 
gccttgactc gcttccagac cataaccagc 



tcgcacattc aatattttca ccgtgtgccg 1920 
cgcgacggcc tgcttgtcgg ctcgggctgc 19 80 
caaaaggcgc cggaagaagt cgaagacatc 2 040 
ccgccggacg tgtacaagcc gctcatcgag 2100 
aaaaacatca tccgcagcat cgtcgccctt 2160 
actggcaacg tccattactt gaacccagaa 2220 
tcgcaaggcg gggcgaatcc gctcaaccgc 22 80 
acgaatgaaa tgcttgactg cttctcgttt 2340 
gttgacaaca cgcaaaaaat cgcttcgtta 2400 
ctgtatacgc cgcgcattga aggggcggac 24 60 
gcgaaggaaa tttacggcga cccgttgccg 2520 
ctaaaaagca tcatcggcca tggctttgcc 2580 
aaaaaatcgc tcgatgacgg ctaccttgtc 2 64 0 
gtcgcgacga tgacggaaat caccgaggtc 27 00 
aactgcaagc attcggagtt ctttaacgac 2760 
gataaaaact gcccgcgatg tgggacgaaa 2820 
gagacgtttc tcggctttaa aggcgacaaa 2880 
gaataccagc cgcgcgccca caactatacg 2940 
cgcgccggga cgattggcac ggtcgctgac 3 000 
gcgagcgacc ataacttaga gctgcgcggc 3060 
ccggggtgaa gcggacgacc gggcagcatc 312 0 
tggaaattta cgattttacg ccgattcaat 3180 
ggacgaccca tttcgacttc cattcgatcc 3240 
ggcacgacga tccgacggto attcgcatgc 33 00 
cgatcccgac cgacgacccg gatgtgatgg 3360 
ttacgccgga gcaaatcatg tgcaatgtcg 342 0 
gcttcgttcg gcaaatgttg gaagagacaa 34 80 
tttccggctt gtcgcacggc accgatgtgt 3540 
acggcacgtg tacgttatcg gaagtcatcg 3600 
tttaccgcgg gctcgagccg tcgctcgctt 3660 
aaggcttaac gccggagttt gaagcagaaa 37 2 0 
tcgattcatg caaaaaaatc aagtacatgt 37 80 
taatggcggt gcgcatcgcc tactttaagg 3 84 0 
actttacggt gcgggcggag gactttgacc 3900 
ttcgcaagcg gattgaggaa atcaacgcca 3960 
gcttgctcac ggttcttgag gtggccttag 4020 
atatcgattt gtaccgctcg caggcgacgg 4080 
cgccgttcaa cgccattccg gggcttggga 4140 
gcgaggaagg cgagtttttg tcgaaggagg 4200 
cgctgctcga gtatctagaa agccgcggct 4260 
tgtcgctgtt t 4301 



<210> 184 
<211> 1433 
<212> PRT 

<213> Bacillus s tearo thermophi lus 
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<400> 184 

Met Val Thr Lys Glu Gin Lys Glu Arg Phe Leu lie Leu Leu Glu Gin 
15 10 15 



Leu Lys Met Thr Ser Asp Glu Trp Met Pro His Phe Arg Glu Ala Ala 



lie Arg Lys Val Val He Asp Lys Glu Glu Lys Ser Trp His Phe Tyr 
35 40 45 

Phe Gin Phe Asp Asn Val Leu Pro Val His Val Tyr Lys Thr Phe Ala 
50 55 60 

Asp Arg Leu Gin Thr Ala Phe Arg His He Ala Ala Val Arg His Thr 
65 70 75 80 

Met Glu Val Glu Ala Pro Arg Val Thr Glu Ala Asp Val Gin Ala Tyr 



Trp Pro Leu Cys Leu Ala Glu Leu Gin Glu Gly Met Ser Pro Leu Val 
100 105 110 

Asp Trp Leu Ser Arg Gin Thr Pro Glu Leu Lys Gly Asn Lys Leu Leu 
115 120 125 

Val Val Ala Arg His Glu Ala Glu Ala Leu Ala He Lys Arg Arg Phe 
130 135 140 

Ala Lys Lys He Ala Asp Val Tyr Ala Ser Phe Gly Phe Pro Pro Leu 
145 150 155 ISO 

Gin Leu Asp Val Ser Val Glu Pro Ser Lys Gin Glu Met Glu Gin Phe 
165 170 175 

Leu Ala Gin Lys Gin Gin Glu Asp Glu Glu Arg Ala Leu Ala Val Leu 
180 185 190 

Thr Asp Leu Ala Arg Glu Glu Glu Lys Ala Ala Ser Ala Pro Pro Ser 
195 200 205 

Gly Pro Leu Val He Gly Tyr Pro He Arg Asp Glu Glu Pro Val Arg 
210 215 220 

Arg Leu Glu Thr He Val Glu Glu Glu Arg Arg Val Val Val Gin Gly 
225 230 235 240 

Tyr Val Phe Asp Ala Glu Val Ser Glu Leu Lys Ser Gly Arg Thr Leu 
245 250 255 
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Leu Thr Met Lys lie Thr Asp Tyr Thr Asn Ser lie Leu Val Lys Met 
260 265 270 



Phe Ser Arg Asp Lys 
275 

Gly Met Trp Val Lys 
290 

Arg Asp Leu Val lie 
305 

Glu Arg Gin Asp Thr 
325 

Leu His Thr Pro Met 
340 

Leu He Glu Gin Ala 
355 

Asp His Ala Val Val 
370 

Lys His Gly Met Lys 
385 

Asp Gly Val Pro He 
405 

Glu Thr Tyr Val Val 
420 

Tyr Asn Thr He lie 
435 

lie He Asp Arg Phe 
450 

Val Thr Thr Met Glu 
465 

Ala Pro Lys Pro Asp 
485 

Asp Ala Thr Leu Val 
500 



Glu Asp Ala Glu Leu Met 
280 

Val Arg Gly Ser Val Gin 
295 

He Ala Asn Asp Leu Asn 
310 315 

Ala Pro Glu Gly Glu Lys 
330 

Ser Gin Met Asp Ala Val 
345 

Lys Lys Trp Gly His Pro 
360 

Gin Ser Phe Pro Glu Ala 
375 

Val He Tyr Gly Leu Glu 
390 395 

Ala Tyr Asn Glu Thr His 
410 

Phe Asp Val Glu Thr Thr 
425 

Glu Leu Ala Ala Val Lys 
440 

Met Ser Phe Ala Asn Pro 
455 

Leu Thr Gly He Thr Asp 
470 475 

Glu Val Leu Ala Arg Phe 
490 

Ala His Asn Ala Ser Phe 
505 



Ser Gly Val Lys Lys 
285 

Asn Asp Thr Phe Val 
300 

Glu He Ala Ala Asn 
320 

Arg Val Glu Leu His 
335 

Thr Ser Val Thr Lys 
350 

Ala He Ala Val Thr 
365 

Tyr Ser Ala Ala Lys 
380 

Ala Asn He Val Asp 
400 

Arg Arg Leu Ser Glu 
415 

Gly Leu Ser Ala Val 
430 

Val Lys Asp Gly Glu 
445 

Gly His Pro Leu Ser 
460 

Glu Met Val Lys Asp 
480 

Val Asp Trp Ala Gly 
495 

Asp He Gly Phe Leu 
510 
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Asn Ala Gly Leu Ala Arg Met Gly Arg Gly Lys lie Ala Asn Pro Val 
515 520 525 



He Asp Thr Leu Glu 
530 

His Arg Leu Asn Thr 
545 

His His Arg Ala He 
565 

Arg Leu Leu Lys Glu 
580 

Leu Asn Ser Arg Thr 
595 

Phe His Val Thr Leu 
610 

Phe Lys Leu Val Ser 
625 

Arg He Pro Arg Ser 
645 

Gly Ser Gly Cys Asp 
660 

Ala Pro Glu Glu Val 
675 

Val His Pro Pro Asp 
690 

Lys Asp Glu Glu Met 
705 

Gly Glu Lys Leu Asp 
725 

Leu Asn Pro Glu Asp 
740 

Gly Gly Ala Asn Pro 
755 



Leu Ala Arg Phe Leu Tyr 
535 

Leu Cys Lys Lys Phe Asp 
550 555 

Tyr Asp Ala Glu Ala Thr 
570 

Ala Glu Glu Arg Gly He 
585 

His Ser Glu Ala Ser Tyr 
600 

Leu Ala Gin Asn Glu Thr 
615 

Leu Ser His He Gin Tyr 
630 635 

Val Leu Val Lys His Arg 
650 

Lys Gly Glu Leu Phe Asp 
665 

Glu Asp He Ala Arg Phe 
680 

Val Tyr Lys Pro Leu He 
695 

He Lys Asn He He Arg 
710 715 

He Pro Val Val Ala Thr 
730 

Lys He Tyr Arg Lys He 
745 

Leu Asn Arg His Glu Leu 
760 



Pro Asp Leu Lys Asn 
540 

He Glu Leu Thr Gin 
560 

Gly His Leu Leu Met 
575 

Leu Phe His Asp Glu 
590 

Arg Leu Ala Arg Pro 
605 

Gly Leu Lys Asn Leu 
620 

Phe His Arg Val Pro 
640 

Asp Gly Leu Leu Val 
655 

Asn Leu He Gin Lys 
670 

Tyr Asp Phe Leu Glu 
685 

Glu Met Asp Tyr Val 
700 

Ser He Val Ala Leu 
720 

Gly Asn Val His Tyr 
735 

Leu He His Ser Gin 
750 

Pro Asp Val Tyr Phe 
765 
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Arg Thr Thr Asn Glu Met Leu Asp Cys Phe Ser Phe Leu Gly Pro Glu 
770 775 780 



Lys Ala Lys Glu lie Val Val Asp Asn Thr Gin Lys lie Ala Ser Leu 
785 790 795 800 

lie Gly Asp Val Lys Pro He Lys Asp Glu Leu Tyr Thr Pro Arg He 
805 810 815 

Glu Gly Ala Asp Glu Glu lie Arg Glu Met Ser Tyr Arg Arg Ala Lys 
820 825 830 

Glu lie Tyr Gly Asp Pro Leu Pro Lys Leu Val Glu Glu Arg Leu Glu 
835 840 845 

Lys Glu Leu Lys Ser He He Gly His Gly Phe Ala Val He Tyr Leu 
850 855 860 

He Ser His Lys Leu Val Lys Lys Ser Leu Asp Asp Gly Tyr Leu Val 
865 870 875 880 

Gly Ser Arg Gly Ser Val Gly Ser Ser Phe Val Ala Thr Met Thr Glu 
885 890 895 

lie Thr Glu Val Asn Pro Leu Pro Pro His Tyr Val Cys Pro Asn Cys 
900 905 910 

Lys His Ser Glu Phe Phe Asn Asp Gly Ser Val Gly Ser Gly Phe Asp 
915 920 925 

Leu Pro Asp Lys Asn Cys Pro Arg Cys Gly Thr Lys Tyr Lys Lys Asp 
930 935 940 

Gly His Asp He Pro Phe Glu Thr Phe Leu Gly Phe Lys Gly Asp Lys 
945 950 955 960 

Val Pro Asp He Asp Leu Asn Phe Ser Gly Glu Tyr Gin Pro Arg Ala 
965 970 975 

His Asn Tyr Thr Lys Val Leu Phe Gly Glu Asp Asn Val Tyr Arg Ala 
980 985 990 

Gly Thr He Gly Thr Val Ala Asp Lys Thr Ala Tyr Gly Phe Val Lys 
995 1000 1005 

Ala Tyr Ala Ser Asp His Asn Leu Glu Leu Arg Gly Ala Glu He Asp 
1010 1015 1020 
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Leu Ala Ala Gly Cys Thr Gly Val Lys Arg Thr Thr Gly Gin His Pro 
1025 1030 1035 1040 



Gly Gly lie lie Val Val Pro Asp Tyr Met Glu lie Tyr Asp Phe Thr 
1045 1050 1055 

Pro lie Gin Tyr Pro Ala Asp Asp Thr Ser Ser Glu Trp Arg Thr Thr 
1060 1065 1070 

His Phe Asp Phe His Ser lie His Asp Asn Leu Leu Lys Leu Asp He 
1075 1080 1085 

Leu Gly His Asp Asp Pro Thr Val He Arg Met Leu Gin Asp Leu Ser 
1090 1095 1100 

Gly He Asp Pro Lys Thr He Pro Thr Asp Asp Pro Asp Val Met Gly 
1105 1110 1115 1120 

He Phe Ser Ser Thr Glu Pro Leu Gly Val Thr Pro Glu Gin He Met 
1125 1130 1135 

Cys Asn Val Gly Thr He Gly He Pro Glu Phe Gly Thr Arg Phe Val 
1140 1145 1150 

Arg Gin Met Leu Glu Glu Thr Arg Pro Lys Thr Phe Ser Glu Leu Val 
1155 1160 1165 

Gin He Ser Gly Leu Ser His Gly Thr Asp Val Trp Leu Gly Asn Ala 
1170 1175 1180 

Gin Glu Leu He Gin Asn Gly Thr Cys Thr Leu Ser Glu Val He Gly 
1185 1190 1195 1200 

Cys Arg Asp Asp He Met Val Tyr Leu He Tyr Arg Gly Leu Glu Pro 
1205 1210 1215 

Ser Leu Ala Phe Lys He Met Glu Ser Val Arg Lys Gly Lys Gly Leu 
1220 1225 1230 

Thr Pro Glu Phe Glu Ala Glu Met Arg Lys His Asp Val Pro Glu Trp 
1235 1240 1245 

Tyr He Asp Ser Cys Lys Lys He Lys Tyr Met Phe Pro Lys Ala His 
1250 1255 1260 

Ala Ala Ala Tyr Val Leu Met Ala Val Arg He Ala Tyr Phe Lys Val 
1265 1270 1275 1280 
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His His Pro Leu Leu Tyr Tyr Ala Ser Tyr Phe Thr Val Arg Ala Glu 
1285 1290 1295 



Asp Phe Asp Leu Asp Ala Met lie Lys Gly Ser Pro Ala lie Arg Lys 
1300 1305 1310 

Arg lie Glu Glu lie Asn Ala Lys Gly He Gin Ala Thr Ala Lys Glu 
1315 1320 1325 

Lys Ser Leu Leu Thr Val Leu Glu Val Ala Leu Glu Met Cys Glu Arg 
1330 1335 1340 

Gly Phe Ser Phe Lys Asn He Asp Leu Tyr Arg Ser Gin Ala Thr Glu 
1345 1350 1355 1360 

Phe Val lie Asp Gly Asn Ser Leu He Pro Pro Phe Asn Ala lie Pro 
1365 1370 1375 

Gly Leu Gly Thr Asn Val Ala Gin Ala He Val Arg Ala Arg Glu Glu 
1380 1385 1390 

Gly Glu Phe Leu Ser Lys Glu Asp Leu Gin Gin Arg Gly Lys Leu Ser 
1395 1400 1405 

Lys Thr Leu Leu Glu Tyr Leu Glu Ser Arg Gly Cys Leu Asp Ser Leu 
1410 1415 1420 

Pro Asp His Asn Gin Leu Ser Leu Phe 
1425 1430 



<210> 185 
<211> 199 
<212> PRT 

<213> Thermus thermophilus 
<400> 185 

Thr Pro Lys Gly Lys Asp Leu Val Arg His Leu Glu Asn Arg Ala Lys 
15 10 15 

Arg Leu Gly Leu Arg Leu Pro Gly Gly Val Ala Gin Tyr Leu Ala Ser 
20 25 30 

Leu Glu Gly Asp Leu Glu Ala Leu Glu Arg Glu Leu Glu Lys Leu Ala 
35 40 45 

Leu Leu Ser Pro Pro Leu Thr Leu Glu Lys Val Glu Lys Val Val Ala 
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50 55 60 

Leu Arg Pro Pro Leu Thr Gly Phe Asp Leu Val Arg Ser Val Leu Glu 
65 70 75 80 

Lys Asp Pro Lys Glu Ala Leu Leu Arg Leu Gly Arg Leu Lys Glu Glu 
85 90 95 

Gly Glu Glu Pro Leu Arg Leu Leu Gly Ala Leu Ser Trp Gin Phe Ala 
100 105 110 

Leu Leu Ala Arg Ala Phe Phe Leu Leu Arg Glu Met Pro Arg Pro Lys 
115 120 125 

Glu Glu Asp Leu Ala Arg Leu Glu Ala His Pro Tyr Ala Ala Lys Lys 
130 135 140 

Ala Leu Leu Glu Ala Ala Arg Arg Leu Thr Glu Glu Ala Leu Lys Glu 
145 150 155 160 

Ala Leu Asp Ala Leu Met Glu Ala Glu Lys Arg Ala Lys Gly Gly Lys 
165 170 175 

Asp Pro Trp Leu Ala Leu Glu Ala Ala Val Leu Arg Leu Ala Arg Pro 
180 185 190 

Ala Gly Gin Pro Arg Val Asp 
195 



<210> 186 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> Description of Artificial Sequence: PCR primer 
<400> 186 

gcccagtacc tcgcctccct cgagggg 

<210> 187 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 
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<223> Description of Artificial Sequence: PCR primer 



<400> 187 

ggcccccttg gccttctcgg cctccat 



<210> 188 
<211> 331 
<212> DNA 

<213> Thermus thermophilus 
<400> 188 

agactcgagg ccctggagcg ggagctggag aagcttgccc tcctctcccc acccctcacc 60 
ctggagaagg tggagaaggt ggtggccctg aggccccccc tcacgggctt tgacctggtg 12 0 
cgctccgtcc tggagaagga ccccaaggag gccctcctgc gcctcaggcg cctcagggag 18 0 
gagggggagg agcccctcag gctcctcggg gccctctcct ggcagttcgc cctcctcgcc 240 
cgggccttct tcctcctccg ggaaaacccc aggcccaagg aggaggacct cgcccgcctc 3 00 
gaggcccacc cctacgccgc caagaaggcc a 331 



<210> 189 
<211> 110 
<212> PRT 

<213> Thermus thermophilus 
<400> 189 

Arg Leu Glu Ala Leu Glu Arg Glu Leu Glu Lys Leu Ala Leu Leu Ser 



Pro Pro Leu Thr Leu Glu Lys Val Glu Lys Val Val Ala Leu Arg Pro 



Pro Leu Thr Gly Phe Asp Leu Val Arg Ser Val Leu Glu Lys Asp Pro 



Lys Glu Ala Leu Leu Arg Leu Arg Arg Leu Arg Glu Glu Gly Glu Glu 



Pro Leu Arg Leu Leu Gly Ala Leu Ser Trp Gin Phe Ala Leu Leu Ala 



Arg Ala Phe Phe Leu Leu Arg Glu Asn Pro Arg Pro Lys Glu Glu Asp 



Leu Ala Arg Leu Glu Ala His Pro Tyr Ala Ala Lys Lys Ala 
100 105 110 
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<210> 190 
<211> 31 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 190 

gtggtgtcta gacatcataa cggttctggc a 31 

<210> 191 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR Primer 
<400> 191 

gagggccacc accttctcca ccttctc 27 

<210> 192 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR Primer 
<400> 192 

ctccgtcctg gagaaggacc ccaag 2 5 

<210> 193 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<220> 

<2 21> primerbind 
<222> (15) 

<223> S at position 15 can be either C or G 
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<220> 

<221> primerbind 
<222> (27) 

<223> S at position 27 can be either C or G 
<400> 193 

cgcgaattca acgcsctcct caagacsct 2 9 

<210> 194 

<211> 31 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 

<400> 194 

gacacttaac atatggtcat cgccttcacc g 31 

<210> 195 
<211> 38 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> Description of Artificial Sequence: PCR primer 
<400> 195 

gtgtgtgaat tcgggtcaac gggcgaggcg gaggaccg 38 

<210> 196 
<211> 10 
<212> PRT 

<213> Deinococcus radiodurans 
<400> 195 

Val lie Leu Asn Pro Gly Ser Val Gly Gin 
15 10 

<210> 197 
<211> 10 
<212> PRT 

<213> Methanococcus jannaschii 
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<400> 197 

Tyr Leu lie Asn Pro Gly Ser Val Gly Gin 
15 10 



<210> 198 
<211> 10 
<212> PRT 

<213> Thermotoga maritima 
<400> 198 

Leu Val Leu Asn Pro Gly Ser Ala Gly Arg 
15 10 

<210> 199 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 199 

ctggtgaacc cgggctccgt gggccagc 2 8 

<210> 200 
<211> 10 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: polypeptide 
<400> 200 

Leu Leu Val Asn Pro Gly Ser Val Gly Gin 
15 10 

<210> 201 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
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<400> 201 

ctcgaggagc ttgaggaggg tgttggc 



27 



<210> 202 
<211> 9 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: polypeptide 
<400> 202 

Ala Asn Thr Leu Leu Lys Leu Leu Glu 
1 5 



<210> 203 
<211> 32 
<212> PRT 

<213> Deinococcus radiodurans 
<400> 203 

Gly Phe Gly Gly Val Gin Leu His Ala Ala His Gly Tyr Leu Leu Ser 
15 10 15 

Gin Phe Leu Ser Pro Arg His Asn Val Arg Glu Asp Glu Tyr Gly Gly 
20 25 30 



<210> 204 
<211> 32 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 204 

Gly Phe Asp Gly lie Gin Leu His Gly Ala His Gly Tyr Leu Leu Ser 
15 10 15 

Gin Phe Thr Ser Pro Thr Thr Asn Lys Arg Val Asp Lys Tyr Gly Gly 
20 25 30 
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<210> 205 
<211> 32 
<212> PRT 

<213> Pseudomonas aeruginosa 
<400> 205 

Gly Phe Ser Gly Val Glu lie His Ala Ala His Gly Tyr Leu Leu Ser 
15 10 15 

Gin Phe Leu Ser Pro Leu Ser Asn Arg Arg Ser Asp Ala Trp Gly Gly 
20 25 30 



<210> 206 
<211> 32 
<212> PRT 

<213> Archaeoglobus fulgidus 
<400> 206 

Gly Phe Asp Ala Val Gin Leu His Ala Ala His Gly Tyr Leu Leu Ser 
15 10 15 

Glu Phe lie Ser Pro His Val Asn Arg Arg Lys Asp Glu Tyr Gly Gly 
20 25 30 



<210> 207 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 207 

catcctggac tcggcccacc tcctcaccga 



<210> 208 
<211> 9 
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<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: polypeptide 
<400> 208 

lie Leu Asp Ser Ala His Leu Leu Thr 
1 5 



<210> 209 
<211> 33 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 209 

gaggaggtag ccgtgggccg cgtggagctc cac 

<210> 210 
<211> 11 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: polypeptide 
<400> 210 

Val Glu Leu His Ala Ala His Gly Tyr Leu Leu 
15 10 

<210> 211 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 211 

ggctttccca tatggctcta cacccggctc ac 
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<210> 212 
<211> 29 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR primer 
<400> 212 

gcgtggatcc acggtcatgt ctctaagtc 2 9 



